Branching mobile-device to system-namespace identifier mappings

ABSTRACT

Provided is a process of merging data from feeds from multiple sources of computing device network activity data having heterogenous device identifier namespaces and device identifier to device mappings that change over time, the process including: accessing three or more sources of network activity log data from three or more different sources of network activity data, wherein: for each of the sources of network activity log data, based the respective network activity log data, updating a multi-namespace mapping that maps the external-namespace device identifiers to internal-namespace device identifiers in an internal namespace of a system configured to profile mobile computing devices based on logged network activity data of the mobile computing devices, wherein: the namespace mapping comprises a plurality of external-namespace-specific mappings each mapping a respective type of device identifier in a respective external namespace used in the network activity log data to one or more internal-namespace device identifiers.

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent is a continuation-in-part of U.S. patent application Ser.No. 14/334,066, filed 17 Jul. 2014, titled Matching Anonymized UserIdentifiers Across Differently Anonymized Data Sets, which claims thebenefit of U.S. Provisional Patent Application 61/847,083 filed 17 Jul.2013, titled “Matching Anonymized User Identifiers Across DifferentlyAnonymized Data Sets”; claims the benefit of U.S. Provisional PatentApplication 62/244,767, filed 22 Oct. 2015, titled BRANCHINGMOBILE-DEVICE TO SYS TEM-NAMESPACE IDENTIFIER MAPPINGS; and claims thebenefit of U.S. Provisional Patent Application 62/244,768, filed 22 Oct.2015, titled DETECTING INFLUENCERS IN SOCIAL NETWORKS WITH LOCATIONDATA, each of which is incorporated by reference.

BACKGROUND

1. Field

The present disclosure relates generally to user profiles and, morespecifically, to generating user profiles based on locations identifiedby matching anonymized user identifiers across differently anonymizeddata sets.

2. Description of the Related Art

User profiles are useful in a variety of contexts. For example,advertisers often purchase advertising based on a desire to reachpotential customers having particular attributes. Such advertisers oftenemploy user profiles to select when, where, or how the advertiserconveys their message. Similarly, market researchers may analyze userprofiles to better understand the market for a given good or servicebased on attributes of buyers of that good or service. In anotherexample, user profiles may be used to customize products or services,for instance, by customizing a software application according to theprofile of a user of the software application, or user profiles may beused by governmental agencies to allocate services to geographic areasaccording to profiles of users in those areas.

User profiles, however, can be difficult to obtain, as users generallyhave little incentive to generate a profile of themselves for use byothers. Such a task can be tedious and unpleasant. Further, users'recollection of their behavior over time can be unreliable.

Instead, advertisers (and other consumers of user profiles) often relyon user profiles generated based on activities of users on variousnetworks or other distributed systems (e.g., cell carriers, ad networks,native applications on smart phones, etc.). Forming such profiles can bedifficult, though, because data from individual sources is ofteninsufficient to reliably profile users and the data is often anonymized.

Frequently, available data identifies users uniquely within a given dataprovider's system, but does not identify users canonically acrossmultiple data provider systems, as each data provider often has adifferent unique ID for the same user (e.g., a device of a user). Thisis typically done to comply with privacy policies of the data providers.But as a result, it is difficult to match a record about a user's devicefrom one data provider with a record about the same user's device fromanother data provider. Also, when users update their equipment, e.g.,with a new cell phone, or when a device identifier for a given device ischanged by a data provider, it can be difficult to tie a user's existingprofile to data from the new equipment, as the third party useridentifiers are often based on identifiers of the equipment (e.g., adata-provider-specific hash of a media access control (MAC) address oran advertiser identification number of the device), or to an existingprofile mapped to the older identifier.

SUMMARY

The following is a non-exhaustive listing of some aspects of the presenttechniques. These and other aspects are described in the followingdisclosure.

Some aspects include a process of joining data from feeds from multiplesources of computing device network activity data having heterogenousdevice identifier namespaces and device identifier to device mappingsthat change over time, the process including: accessing, with one ormore processors, three or more sources of network activity log data fromthree or more different sources of network activity data, wherein: eachsource of network activity log data describes network activity by morethan 100,000 mobile computing devices, each source of network activitylog data describes activities over a duration of time longer than onehour, each source of network activity log data provides transactionrecords of more than one 1 million transactions by at least some of themobile computing devices, each transaction record including one or moreexternal-namespace device identifiers in an external namespace of arespective mobile computing device participating in the respectivenetwork transaction, and the transaction records associate geolocationsreported by the mobile computing devices with timestamps andexternal-namespace device identifiers of the mobile computing devices;for each of the sources of network activity log data, based therespective network activity log data, updating, with one or moreprocessors, a multi-namespace mapping that maps the external-namespacedevice identifiers to internal-namespace device identifiers in aninternal namespace of a system configured to profile mobile computingdevices based on logged network activity data of the mobile computingdevices, wherein: the namespace mapping comprises a plurality ofexternal-namespace-specific mappings each mapping a respective type ofdevice identifier in a respective external namespace used in the networkactivity log data to one or more internal-namespace device identifiers,and at least some of the external-namespace device identifiers aremapped in at least some of the external-namespace-specific mappings to aplurality of internal-namespace device identifiers, with a given deviceexternal-namespace device identifier being mapped to a given pluralityof internal-namespace device identifiers; after updating themulti-namespace mapping, receiving, with one or more processors, anexternal-namespace device identifier; selecting, with one or moreprocessors, one of the external-namespace-specific mappings based on theexternal namespace of the received external-namespace device identifier;accessing, with one or more processors, a plurality ofinternal-namespace device identifiers mapped to the receivedexternal-namespace device identifier by the selectedexternal-namespace-specific mapping; and accessing, with one or moreprocessors, a device profile associated with at least some of theplurality internal-namespace device identifiers.

Some aspects include a tangible, non-transitory, machine-readable mediumstoring instructions that when executed by a data processing apparatuscause the data processing apparatus to perform operations including theabove-mentioned process.

Some aspects include a system, including: one or more processors; andmemory storing instructions that when executed by the processors causethe processors to effectuate operations of the above-mentioned process.

BRIEF DESCRIPTION OF THE DRAWINGS

The above-mentioned aspects and other aspects of the present techniqueswill be better understood when the present application is read in viewof the following figures in which like numbers indicate similar oridentical elements:

FIG. 1 shows an example of an environment in which a user profileroperates in accordance with some embodiments;

FIG. 2 shows an embodiment of a process for profiling a user;

FIG. 3 shows an embodiment of another process for profiling a user;

FIG. 4 shows an example of a user profiler configured to matchnon-canonical user identifiers in accordance with some embodiments;

FIG. 5 shows an embodiment of a process for matching non-canonical useridentifiers;

FIG. 6 shows an embodiment of a process for forming branchingsystem(internal)-namespace device identifiers mapped tonon-system(external)-namespace device identifiers;

FIG. 7 shows an example of a data structure upon which the process ofFIG. 6 operates;

FIG. 8 shows an example of an environment in which a user profileroperates in accordance with some embodiments; and

FIG. 9 shows an embodiment of a computer system by which theabove-mentioned systems and processes may be implemented.

While the invention is susceptible to various modifications andalternative forms, specific embodiments thereof are shown by way ofexample in the drawings and will herein be described in detail. Thedrawings may not be to scale. It should be understood, however, that thedrawings and detailed description thereto are not intended to limit theinvention to the particular form disclosed, but to the contrary, theintention is to cover all modifications, equivalents, and alternativesfalling within the spirit and scope of the present invention as definedby the appended claims.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

To mitigate the problems described herein, the inventors had to bothinvent solutions and, in some cases just as importantly, recognizeproblems overlooked (or not yet foreseen) by others in the fields ofcomputer science and geolocation analytics. Indeed, the inventors wishto emphasize the difficulty of recognizing those problems that arenascent and will become much more apparent in the future should trendsin industry continue as the inventors expect. Further, because multipleproblems are addressed, it should be understood that some embodimentsare problem-specific, and not all embodiments address every problem withtraditional systems described herein or provide every benefit describedherein. That said, improvements that solve various permutations of theseproblems are described below.

Correlating mobile computing device identifiers is a fundamentalchallenge in the network data analytics industry. The way people usetheir computing devices on computer networks (e.g., which content theyrequest or view and when and where they use their device) is the bedrockupon which are built many of the statistical inferences driving contentrecommendation, AI personal assistants, advertisement selection, andmany forms of site selection and municipal design. Yet matching recordsof network usage to devices, in a way that is consistent over time, is apersistent challenge. This is due, in part to the scale of data atissue, often with billions or hundreds of billions of records of networktransactions in many data sets. Many computer systems cannot process therelevant analyses at these scales in reasonable run-times. This is alsodue to changes in the device identifiers over time and different dataproviders using different identifiers for the same computing device(without indicating correlations therebetween). Due to these challenges,the ability to intelligently manage these namespaces and relationshipstherebetween is expected to become a key differentiator in drivingdemand for geolocation analytics systems, as namespace management liesupstream of many analyses that are much more highly valued if based onaccurate data.

Below, techniques are described to mitigate these problems. FIGS. 1-4describe a first set of techniques for profiling places and people; FIG.5 describes a second set of techniques for dealing with anonymized user(e.g., device) identifiers; and FIGS. 6-7 describe another set oftechniques for managing a changing representation of device identifiermappings over time. It should be emphasized that several inventions aredescribed, and that while the inventions can be used synergisticallytogether, they are also independently useful.

FIG. 1 shows an example of a computing environment 10 having a userprofiler 12 operative to generate user profiles to be stored in auser-profile datastore 14. In some embodiments, the user profiler 12generates (for example, instantiates or updates) the user profiles basedon location histories from mobile devices 16 and attributes ofgeographic locations stored in a geographic information system 18. Theresulting user profiles may reflect the attributes of the locationsvisited by users. The location histories may be conveyed via theInternet 20 to remote locations, and the user profiles may be used byadvertisement servers 22 to select advertisements for presentation tousers or for other purposes described below.

The profiles may characterize a variety of attributes of users. In oneillustrative use case, a location history may indicate that a userfrequently visits geographic locations associated with tourism, and theprofile of that user may be updated to indicate that the user frequentlyengages in tourism, which may be of interest to certain categories ofadvertisers. Or a user may spend their working hours in geographic areasassociated with childcare and residences, and based on their locationhistory, the profile of that user may be updated to indicate that theuser likely engages in childcare for children younger than school age.Other examples are described below.

Further, as explained in detail below, the attributes associated withgeographic locations may vary over time (for example, an area withcoffee shops and bars may have a stronger association with consumptionof breakfast or coffee in the morning, an association which weakens inthe evening, while an association with entertainment or nightlife may beweaker in the morning and stronger in the evening). User profiles may begenerated in accordance with the time-based attributes that predominatewhen the user is in a geographic area. And in some embodiments, userprofiles may also be segmented in time, such that a portion of a givenuser's profile associated with a weekday morning may have differentattributes than another portion of that user's profile associated with aweekend night, for instance.

The user profiles may be used by advertisers and others in aprivacy-friendly fashion, such that users are expected to tend to opt into sharing their location history. For example, the user profiles may beaggregated to identify geographic areas having a high density of aparticular type of user at a particular time of the week, such as asports stadium having a relatively large number of users associated withfishing as a hobby, or a children's soccer field in which a relativelylarge number of people associated with golfing as a hobby might tend toco-occur on weekend mornings. Such correlations may be presented toadvertisers or others without disclosing information by which individualusers can be uniquely identified. In other applications, user-specificinformation may be provided, for example, users who opt in to sharingtheir profiles may receive user-specific services or communicationsformulated based on the individual profile of that user.

Accounting for time when characterizing geographic areas is believed toyield relatively accurate characterizations of places, as the activitiesthat people engage in at a given location tend to depend strongly ontime of day and week. And for similar reasons, accounting for time whenprofiling users is expected to yield relatively accuratecharacterizations of the users. Generating profiles based on locationhistory further offers the benefit of profiling users without imposingthe burden of manually doing so on the users themselves, and usingattributes of geographic areas in which the user travels is expected toyield relatively privacy-friendly data about the user. That said, notall embodiments offer all, or any, of these benefits, as variousengineering and cost trade-offs are envisioned, and other embodimentsmay offer other benefits, some of which are described below.

As noted above, the user profiler 12 obtains data from the mobiledevices 16 and the geographic information system 18 to output userprofiles to the user-profile datastore 14 for use by the ad servers 22or for other purposes. Accordingly, these components are described inthis sequence, starting with inputs, and concluding with outputs.

The mobile devices 16 maybe any of a variety of different types ofcomputing devices having an energy storage device (e.g., a battery) andbeing capable of communicating via a network, for example via a wirelessarea network or a cellular network connected to the Internet 20. In somecases, the mobile devices 16 are handheld mobile computing devices, suchas smart phones, tablets, or the like, or the mobile devices may belaptop computers or other special-purpose computing devices, such as anautomobile-based computer (e.g., an in-dash navigation system). Themobile devices 16 may have a processor and a tangible, non-transitorymachine-readable memory storing instructions that provide thefunctionality described herein when executed by the processor. Thememory may store instructions for an operating system, special-purposeapplications (apps), and a web browser, depending upon the use case. Itshould be noted, however, that the present techniques are not limited tomobile devices, and other computing devices subject to geolocation mayalso generate data useful for forming user profiles. For instance,set-top boxes, gaming consoles, or Internet-capable televisions may begeolocated based on IP address, and data from user interactions withthese devices may be used to update user profiles, e.g., with userinteraction indicating a time at which a user was at the geolocationcorresponding to the device.

This software may have access to external or internal services by whichthe location of the mobile device may be obtained. For example, themobile device may have a built-in satellite-based geolocation device(for instance a global-positioning system, or GPS, device or componentsoperative to obtain location from other satellite-based systems, such asRussia's GLONASS system or the European Union's Galileo system). Inanother example, location may be obtained based on the current wirelessenvironment of the mobile device, for example by sensing attributes ofthe wireless environment (e.g. SSIDs of wireless hotspots, identifiersof cellular towers and signal strengths, identifiers of low energyBluetooth beacons, and the like) and sending those attributes to aremote server capable of identifying the location of the mobile device.In some embodiments, the location may be obtained based on an identifierof a network node through which the mobile device connects to theInternet, for example by geocoding an IP address of a wireless router orbased on a location of a cellular tower to which the mobile device isconnected. The location may be expressed as a latitude and longitudecoordinate or an area, and in some cases may include a confidence score,such as a radius or bounding box defining area within which the deviceis expected to be with more than some threshold confidence.

From time to time, the location of the mobile devices 16 may be obtainedby the mobile devices. For example, when a user interacts with aspecial-purpose application, in some cases, the application may havepermission to obtain the location of the mobile device and report thatlocation to a third party server associated with the application, suchthat the location may be obtained by the user profiler 12 from the thirdparty server. In another example, the user may visit a website havingcode that obtains the current location of the mobile device. Thislocation may be reported back to the server from which the website wasobtained or some other third party server, such as an ad server for anaffiliate network, and location histories may be obtained from thisserver. In another example, locations of the mobile devices 16 may beobtained without the participation of the mobile device beyondconnecting to a network. For instance, users may opt in to allowing acellular service provider to detect their location based on cellularsignals and provide that location to the user profiler 12. Dependingupon how location is obtained, the location may be acquiredintermittently, for example at three different times during a day when auser launches a particular application, or relatively frequently, forexample by periodically polling a GPS device and reporting the location.In some cases, the location history may include locations obtained morethan one-second apart, more than one-minute apart, more than one-hourapart, or more, depending upon the use case.

Locations may be obtained in real time from mobile devices 16 by theuser profiler 12, or in some embodiments, location histories may beobtained, e.g., from third party data providers using the process andsystems described below with reference to FIGS. 4-5. Each locationhistory may include records of geographic locations of a given mobiledevice and when the mobile device was at each location. In some cases, alocation history may include records of location over a relatively longduration of time, such as more than over a preceding hour, day, week, ormonth, as some modes of acquiring location histories report or updatelocation histories relatively infrequently. A location history for agiven mobile device may include a plurality (e.g., more than 10 or morethan 100) location records, each location record corresponding to adetected location of the mobile device, and each location recordincluding a geographic location and the time at which the mobile devicewas at the location. The location records may also include a confidencescore indicative of the accuracy of the detected location. Geographiclocations may be expressed in a variety of formats with varying degreesof specificity, for example as a latitude and longitude coordinates, astiles in a grid with which a geographic area is segmented (e.g.,quantized), or in some other format for uniquely specifying places.

The geographic information system 18 may be configured to provideinformation about geographic locations in response to queries specifyinga location of interest. In some embodiments, the geographic informationsystem 18 organizes information about a geographic area by quantizing(or otherwise dividing) the geographic area into area units, calledtiles, that are mapped to subsets of the geographic area. In some cases,the tiles correspond to square units of area having sides that arebetween 10-meters and 1000-meters, for example approximately 100-metersper side, depending upon the desired granularity with which a geographicarea is to be described. In other examples, the tiles have other shapes,e.g., hexagon shapes that are arranged in a two-dimensional hexagonalpacking layout.

In some cases, the attributes of a geographic area change over time.Accordingly, some embodiments divide each tile according to time. Forinstance, some embodiments divide each tile into subsets of some periodof time, such as one week, one month, or one year, and attributes of thetile are recorded for subsets of that period of time. For example, theperiod of time may be one week, and each tile may be divided by portionsof the week selected in view of the way users generally organize theirweek, accounting, for instance, for differences between work days andweekends, work hours, after work hours, mealtimes, typical sleep hours,and the like. Examples of such time divisions may include a duration fora tile corresponding to Monday morning from 6 AM to 8 AM, during whichusers often eat breakfast and commute to work, 8 AM till 11 AM, duringwhich users often are at work, 11 AM till 1 PM, during which users areoften eating lunch, 1 PM till 5 PM, during which users are often engagedin work, 5 PM till 6 PM, during which users are often commuting home,and the like. Similar durations may be selected for weekend days, forexample 8 PM till midnight on Saturdays, during which users are oftenengaged in leisure activities. Each of these durations may be profiledat each tile.

In some embodiments, the geographic information system 18 includes aplurality of tile records, each tile record corresponding to a differentsubset of a geographic area. Each tile record may include an identifier,an indication of geographic area corresponding to the tile (which forregularly size tiles may be the identifier), and a plurality oftile-time records. Each tile-time record may correspond to one of theabove-mentioned divisions of time for a given tile, and the tile-timerecords may characterize attributes of the tile at different points oftime, such as during different times of the week. Each tile-time recordmay also include a density score indicative of the number of people inthe tile at a given time. In some embodiments, each tile-time recordincludes an indication of the duration of time described by the record(e.g. lunch time on Sundays, or dinnertime on Wednesdays) and aplurality of attribute records, each attribute record describing anattribute of the tile at the corresponding window of time during somecycle (e.g., weekly).

The attributes may be descriptions of activities in which users engagethat are potentially of interest to consumers of the user-profiledatastore 14. For example, some advertisers may be interested in whenand where users go to particular types of restaurants, when and whereusers play golf, when and where users watch sports, when and where usersfish, or when and where users work in particular categories of jobs. Insome embodiments, each tile-time record may include a relatively largenumber of attribute records, for example more than 10, more than 100,more than 1000, or approximately 4000 attribute records, depending uponthe desired specificity with which the tiles are to be described. Eachattribute record may include an indicator of the attribute beingcharacterized and an attribute score indicating the degree to whichusers tend to engage in activities corresponding to the attribute in thecorresponding tile at the corresponding duration of time. In some cases,the attribute score (or tile-time record) is characterized by a densityscore indicating the number of users expected to engage in thecorresponding activity in the tile at the time.

Thus, to use some embodiments of the geographic information system 18, aquery may be submitted to determine what sort of activities users engagein at a particular block in downtown New York during Friday evenings,and the geographic information system 18 may respond with the attributerecords corresponding to that block at that time. Those attributerecords may indicate a relatively high attribute score for high-enddining, indicating that users typically go to restaurants in thiscategory at that time in this place, and a relatively low attributescore for playing golf, for example. Attribute scores may be normalized,for example a value from 0 to 10, with a value indicating the propensityof users to exhibit behavior described by that attribute.

The user profiler 12 may join the location histories and tile recordsimplicated by locations in those location histories to generate userprofiles. Thus, users may be characterized according to the attributesof the places those users visit at the time the user visits thoseplaces. The generated user profiles may then be stored by the userprofiler 12 in the user-profile datastore 14, as described below. Tothis end, or others, some embodiments of the user profiler 12 includes alocation-history acquisition module 24, a location-attribute acquisitionmodule 26, and a user-attribute updater 28 operative to generate userprofiles.

The user profiler 12 may be constructed from one or more of thecomputers described below with reference to FIG. 9. These computers mayinclude a tangible, non-transitory, machine-readable medium, such asvarious forms of memory storing instructions that when executed by oneor more processors of these computers (or some other data processingapparatus) cause the computers to provide the functionality of the userprofiler 12 described herein. The components of the user profiler 12 areillustrated as discrete functional blocks, but it should be noted thatthe hardware and software by which these functional blocks areimplemented may be differently organized, for example, code or hardwarefor providing the this functionality may be intermingled, subdivided,conjoined, or otherwise differently arranged.

The illustrated location-history acquisition module 24 may be configuredto acquire location histories of mobile devices 16 via the Internet 20.The location histories may be acquired directly from the mobile devices16, or the location histories may be acquired from various thirdparties, such as a third-party hosting Web applications rendered on themobile devices 16, third parties hosting servers to which locationhistories are communicated by apps on the mobile devices 16, or thirdparties providing network access to the mobile devices 16, such ascellular service providers, for example. The location-historyacquisition module 24 may include a plurality of sub-modules forobtaining location histories from a plurality of different providers.These sub-modules may be configured to request, download, and parselocation histories from a respective one of the different providers viaapplication program interfaces provided by those providers. Thesub-modules may normalize the location histories from the differentproviders, which may be in different formats, into a common format foruse in subsequent processing. Location histories may be acquiredperiodically, for example monthly, weekly, or hourly, or morefrequently.

The user profiler 12 of this embodiment further includes thelocation-attribute acquisition module 26. The module 26 may beconfigured to obtain attributes of locations identified based on thelocation histories acquired by the location history acquisition module24. For example, the module 26 may be configured to iterate through eachlocation identified by each location history and query the geographicinformation system 18 for attributes of those locations at the time atwhich the user was at the corresponding location. In some cases, thelocation-attribute acquisition module 26 may also request attributes ofadjacent locations, such as adjacent tiles, from the geographicinformation system 18 so that the user-attribute updater 28 candetermine whether a signal from a given tile is consistent with that ofsurrounding tiles for assessing the reliability of various indications.

The acquired location histories and location attributes may be providedby modules 24 and 26 to the user-attribute updater 28, which in someembodiments, is configured to generate user profiles based on this data.In some cases, the user-attribute updater 28 is operative to performportions of the processes of FIG. 2 or 3, described in detail below, andattach attributes of places visited by users to the profile of thoseusers. These profiles may be stored by the user attribute updater 28 inthe user-profile datastore 14.

The user profile datastore 14 may be operative to store user profilesand, in some embodiments, address queries for data in the user profiles.The illustrated user-profile datastore 14 includes a plurality ofuser-profile records, each record corresponding to the profile of agiven user or a given mobile device 16. Each user-profile record mayinclude an identifier of the record (which may be a value otherwiseuncorrelated with the identity of the user to enhance privacy), and anidentifier of the source or sources of the location histories from whichthe profile was created such that subsequent location histories can bematched with the profile (e.g. a account associated with aspecial-purpose application, a cell phone number, or some other value,which may be hashed to enhance user privacy).

Each user-profile record may also include a plurality of profile timerecords indicating attributes of the user profile at different timesduring some cycle of time (e.g., portions of the week or month, orduring other periods like those described above with reference to thegeographic information system 18). In some cases, the profile-timerecords may correspond to the same durations of time as those of thetime-tile records described above. Each profile-time record may includean indication of the duration of time being described (e.g. Thursday'sat dinnertime, or Saturday midmorning) and a plurality of profileattribute records, each profile attribute record indicating thepropensity of the corresponding user to engage in an activity describedby the attribute during the corresponding time of the profile-timerecord. The profile time records may allow tracking of when users tendto engage in a given activity (time of day, day of week, week of year).In some embodiments, the profile attribute records correspond to thesame set of attribute records described above with reference to thegeographic information system 18. Each profile-attribute record mayinclude an indication of the attribute being characterized (e.g.,attending a children's soccer game, having brunch at a fast-casualdining establishment, parent running errands, or shopping at a mall) anda score indicating the propensity of the user to engage in the activityat the corresponding time, such as a normalized value from 0 to 10. Theattribute records may further include a sample size, indicative of thenumber of samples upon which the attribute score is based, for weightingnew samples, and a measure of variance among these samples (e.g., astandard deviation) for identifying outliers.

As described below, the user-profile records may be used for a varietyof purposes. For example, advertisers operating ad servers 22 may submitto the user-profile datastore 14 a query identifying one of theuser-profile records, such as the above-mentioned hashed value of a useraccount number or phone number, and the user-profile datastore 14 mayrespond with the attributes of the corresponding user at the currenttime. In some embodiments, to further enhance user privacy, queries maybe submitted for a specific attribute to determine whether to serve anadvertisement corresponding to the attribute, or a query may request abinary indication of whether the attribute score is above a threshold.

In another example, the user-profile datastore 14 may be used by theuser profiler 12 to augment the records in the geographic informationsystem 18. For example, an index may be created for each attribute thatidentifies tiles where users having relatively strong scores (e.g. abovea threshold) for the respective attribute tend to co-occur at giventimes. These indices may correspond to heat maps (though no visualrepresentation need be created) indicating where, for example, usersinterested in golf, tend to be during various times of the day, suchthat advertisers can select advertisements based on this information. Insome embodiments, an index may be created for each user attribute ateach of the above-described divisions of time in the geographicinformation system 18, and these indices may be queried to providerelatively prompt responses relating to where users having a givenattribute or combination of attributes tend to co-occur at varioustimes. Precalculating the indices is expected to yield faster responsesto such queries than generating responsive data at the time the query isreceived. For instance, using examples of these indices relating tofishing and employment in banking, an advertiser may determine thatpeople who engage in fishing on the weekend and work in banking tend todrive relatively frequently along a particular stretch of road onMondays during the evening commute, and that advertiser may purchase anadvertisement for bass fishing boats on a billboard along that road inresponse. Other examples relating to customization of software andservices and other forms of analysis are described in greater detailbelow.

In short, some embodiments of the computing environment 10 generate userprofiles that are relatively privacy-friendly to users and consumerelatively little effort on the part of users or others to create theprofiles. These advantages are expected to yield a relativelycomprehensive set of relatively high-resolution user profiles that maybe used by advertisers and others seeking to provide information andservices customized to the unique attributes of each user, facilitatingthe presentation of high-value and high-relevance advertisements andservices to users while respecting the users' interest in privacy. Thatsaid, not all embodiments provide these benefits, and some embodimentsmay forgo some or all of these embodiments in the interest of variousengineering trade-offs relating to time, cost, and features.

FIG. 2 illustrates an embodiment of a process 30 that may be performedby the above-describes user profiler 12. The steps of the process 30(and other processes described herein) may be performed in a differentorder than the order in which the steps are described. In someembodiments, the process 30 includes obtaining a location history of auser, as illustrated by block 32. This step may be performed by theabove-described location-history acquisition module 24. As noted above,location histories may be obtained from a plurality of differentproviders of location histories, and the location histories may bereformatted into a common format for subsequent processing.

The process 30 of this embodiment further includes obtaining alocation-attribute score of a location identified in, or inferred from,the location history, as indicated by block 34. This step may beperformed by the above-described location-attribute acquisition module26. The location-attribute score may be one of a plurality of scorescorresponding to a time-tile record described above.

In some embodiments, locations identified in the location history may berelatively sparse, and intermediate locations between those explicitlyidentified may be inferred. For example, the user profiler 12 maydetermine that two locations are more than a threshold amount of timeapart and a threshold distance apart, indicating that the user likelytraveled between the location during the intermediate time. In response,the user profiler 12 may query the geographic information system 18 forlocations associated with travel, such as locations corresponding to aninterstate highway, between the two locations, and the locations alongthe interstate highway (or associated with some other mode of travel)may be added to the location history at the intermediate times asinferred locations. Inferring intermediate locations is expected toyield a more comprehensive characterization of the user's profile.

In some embodiments, the process 30 further includes determining auser-attribute score based on the location attribute score, as indicatedby block 36. Determining a user-attribute score may include incrementinga sample size for the corresponding attribute in the user profile andcalculating an updated average attribute score. An average is one of avariety of different forms of measures of central tendency that may beused to determine the user-attribute score. In other embodiments,previous attribute scores of locations visited by the user may be storedin memory, and a median or mode score may be calculated using the newlyobtained location-attribute score and those stored in memory. Thus,deviations indicating one-time instance in which the user engaged in aparticular activity will tend to have a relatively small effect on theuser profile, as previous location histories will likely indicate arelatively low propensity to engage in a particular activity and dilutethe effect of a single instance.

In some embodiments, the process 30 further includes storing theuser-attribute score in a user-profile datastore, as indicated by block38. As noted above, this may include updating indices corresponding tovarious attributes in a geographic information system, and the storeduser profiles may be queried by advertisers and others seeking toprovide targeted messaging and services. Targeting may be towardspecific users who are profiled or to the places profiled users visit orbased on patterns in attribute scores among profiled users.

FIG. 3 illustrates another embodiment of a process 40 for generatinguser profiles. This process may be performed by the above-mentioned userprofiler 12. The illustrated process begins with receiving a locationhistory of a mobile user device from an application executed on thedevice, as illustrated by block 42. Next, intermediate locations areinferred between locations in the location history, as indicated byblock 44. As noted above, some embodiments may identify gaps in thelocation history and infer intermediate locations based on attributes oftiles between the boundaries of the gap, selecting, for example,intermediate tiles associated with transit. In this embodiment, theprocess 40 includes determining whether there are any un-analyzedlocations in the location history, as indicated by block 46. Upondetermining that no such locations remain, the process completes.Alternatively, upon determining that locations remain to be analyzed,the process 40 in this embodiment proceeds to identify a time-bincorresponding to a time-stamp of the next location in the locationhistory, as indicated by block 48. The time-bin may be one of theabove-describes durations of time by which tiles or user profiles arecharacterized. The corresponding time bin may be the time bin in whichthe timestamp of the use location falls. Next, this embodiment ofprocess 40 includes querying a geographic information system for tilerecords of the tiles corresponding to the location and adjacent tiles,as indicated by block 50.

In some embodiments, the process 40 further includes determining whetherthe user is likely at an adjacent location, as indicated by block 52.Such a determination may include making the determination based on theattributes of the adjacent tiles or density scores for the tilescorresponding to the timestamp of the user location in question. Forexample, attribute scores for the location in the location history mayindicate that less than a threshold amount of user activity occurswithin the tile corresponding to that location (e.g., a density valueindicative of the number of people in the tile may be below athreshold), while attribute scores for one of the adjacent tiles mayindicate a relatively high density or amount of activity (e.g., morethan a threshold, or more than a threshold difference from the adjacenttile) for one or more attributes. In response to this difference, it maybe determined that the location in the location history is in error (forinstance, due to an inaccurate GPS reading), and the adjacent locationmay be selected as being a more likely location of the user. Someembodiments may select the adjacent location having the highest densityor aggregation of attribute scores, for example. Thus, some embodimentsmay designate the adjacent location as the user location, as indicatedby block 54, or in response to a negative determination in block 52,some embodiments may proceed to the next step without such a designationoccurring.

Embodiments may further include determining whether adjacent tiles havesimilar location-attribute scores, as indicated by block 56. Becauselocation measurements by mobile devices are often relatively inaccurate,there is some risk that the user is not at the location identified andis instead in an adjacent tile. However, if the adjacent tiles havesimilar attribute scores, those attributes can be attributed to the userwith a relatively high degree of confidence regardless of whether theuser is in the exact tile identified by the location in the locationhistory. Accordingly, some embodiments may determine whether theadjacent tiles have similar location attribute scores (at the time inquestion for the user location), for example attribute scores less thana threshold difference for more than a threshold number of attributes.Other embodiments may calculate a confidence score based on thesimilarity of adjacent tiles and weight the modification of the userprofile based on the confidence score, down weighting signals ininstances in which the adjacent tiles are relatively different from oneanother, or a binary determination may be made as illustrated in FIG. 3.Upon determining that adjacent tiles do not have similarlocation-attribute scores, some embodiments return to block 46.Alternatively, the process may proceed to the next step.

Some embodiments of process 40 may include determining whether thelocation-attribute score is an outlier for the user, as indicated byblock 58. This step may include iterating through each locationattribute score of the user's location and comparing that attributescore at the time in question to a corresponding attribute score in theusers profile to identify uncharacteristic behavior indicative of apotentially misleading signal. In some embodiments, attributes may bedesignated as an outlier in response to the location attribute scoreexceeding a threshold difference from the average, for example locationattribute scores more than three standard deviations higher or lowerthan the average attribute score in the user profile for a givenattribute. In some embodiments, the determination of step 58 is made foreach of a plurality of attributes of the location, and those attributesdeemed to be outliers may be filtered before proceeding to the nextstep, or some embodiments may return to step 46 in response to thedetection of an outlier. Some implementations may use a similarity modelto detect inaccurate signals in acquired location histories. Using suchmodels, embodiments may filter out questionable location readings so asnot to pollute profile development. For instance, a reliability databasesimilar to the GIS may be referenced during profile analysis bysubmitting a query with metadata about entries in a location history(e.g., publisher, time of day, location, OS, device type, locationdetermination method (e.g. GPS vs. WiFi™ or other wireless network)).The reliability database may provide a response indicative of thepredicted level of accuracy of the incoming location. The reliabilitydatabase may store data from sources know to be reliable, and this datamay indicate expected levels of activity at a location. If a specificdata set diverges significantly from this (e.g., attribute scores for atile are more than a threshold amount different from those in thereliability database), in response, the user profiler may flag thelocation history as likely being less accurate, and based on such flags,the data may be discarded or changes to user profiles may be downweighted.

Upon determining that the location-attribute score is not an outlier,the process 40 proceeds to step 60, and a mean user-attribute score isupdated based on the location-attribute score. Updating theuser-attribute score may include updating each of a plurality ofuser-attribute scores based on a plurality of location-attribute scoresthat were not filtered out in step 58. Updating the user-attribute scoremay include multiplying the current score by a count of measurementsupon which that score is based, adding to the resulting product thelocation-attribute score, and dividing this some by the count plus oneto calculate a new average user-attribute score. This value andincremented version of the count may be stored in a correspondingattribute record in the user profile.

The process 40 may be repeated for each of a relatively large number oflocation histories, each location history corresponding to a differentuser profile. The process 40 may be repeated periodically, for examplenightly, weekly, or monthly to update user profiles and instantiate newuser profiles. The resulting user profiles may be stored in theabove-mentioned user-profile datastore 14.

In some cases, after updating user profiles, various indices may beformed to expedite certain queries to the geographic information system18. For example, some embodiments may form an index keyed to eachattribute for which a score is maintained in the tile records or theuser-profile records. For example, embodiments may calculate an indexthat identifies each tile in which users having more than a thresholdscore for a given attribute co-occur during one of the above-describedtime bins (e.g., by multiplying a density score for each tile with anattribute score and thresholding the resulting product). This index maybe used to relatively quickly determine whether a given geographic areaat a given time is correlated with a given attribute and has a highdensity of people exhibiting behavior described by that attribute.Further, some embodiments may use such an index to identify geographicareas in which a collection of attributes are relatively strong, forinstance determining the union of the set of values corresponding toeach of a plurality of different attributes to identify, for instance,where users associated with golfing, fishing, and tourism are at arelatively high concentration on mid-afternoons on Sundays.

Embodiments of the process 40 may be performed concurrently on multiplecomputing devices to expedite calculations. For instance, a mastercomputing device may iterate through a list of user device identifiersand assign profiling tasks to each of a plurality of profiling computingdevices, each of which determine corresponding profiles for differentusers at the same time. Using similar techniques, the formation ofindices may also be parallelized. For instance, each attribute may beassigned to a different set of computing devices, and that set ofcomputing devices may identify the areas in which the attribute hascertain criteria (e.g., greater than a threshold amount of activity),while other sets of computing devices perform the same task fordifferent attributes. Such concurrent operations are expected tofacilitate faster computation of profiles and indices than wouldotherwise be achieved, though not all embodiments provide for concurrentoperations.

Other uses of concurrency may expedite data retrieval. For instance,querying a GIS once for each user event (e.g., a particular user beingpresent in a tile at during a particular window of time) may berelatively slow, as the number of such events can be very large. Toexpedite retrieval from the GIS, some embodiments may group events toreduce queries. Such embodiments may include a master computing device(e.g., a virtualized or physical computing instance) that maps each tileto one of a plurality of other computing devices (e.g., a virtualized orphysical computing instance) and instructs those devices (e.g., over alocal area network in a data center) to gather data from the GIS aboutevents occurring in their respective assigned tile or tiles. Inresponse, the other computing devices may each filter the user eventsoccurring within their respective tile from the obtained locationhistories, each forming an event group of events occurring within anassigned tile, and submit one or more queries to the GIS for attributesof the tile during relevant time periods corresponding to user events inthe group (e.g., when a user was in the tile). After the responsive datais retrieved, the other computing devices may then iterate through eachuser having user events in the event group and join the responsive GISdata for each user with the corresponding user profile. Thus a singlequery, or one query per time period in question, may retrieve relevantdata for a plurality of user events, thereby reducing the number ofqueries to the GIS and expediting analysis of user histories. Further,parallelizing the analysis for different tiles across multiple computingdevices is expected to further expedite such analyses.

Further, in the above example of concurrent operation in which differenttiles are assigned to different computing devices, each of the othercomputing devices holds in memory user profiles for users passingthrough the tile. This user profile data may be aggregated to calculateor update a count for the tile at a particular window of time, e.g., bycounting the number of user profiles corresponding to the tile at aparticular time and having a particular attribute, such as an attributescore greater than a threshold. Again, concurrent operation is expectedto expedite analysis, and aggregating the user profile datacorresponding to the respective tiles while this data is in memory forpurposes of updating the user profiles is expected to reduce calls tothe user-profile data store and speed analysis.

In some embodiments, analysis of user profiles is parallelized accordingto the combination of user profile and attribute, such that differentcomputing devices analyze different attributes for a given userconcurrently and different computing devices analyze different usersconcurrently (e.g., mapping user A, attribute X to device 1; user A,attribute Y to device 2; user B, attribute X to device 3; etc.). Again,a master computing device may map profile-attribute pairs to each of aplurality of other computing devices and instruct those devices (e.g.,over a local area network in a data center) to sum the counts for thoseattributes for those users across all of the tiles having data for thoseusers. For instance, the above technique may be used to analyze each ofa plurality of tiles concurrently with different computing devicesmapped to different tiles, and then the technique of this paragraph maybe used to aggregate this data for each user profile/attribute pairacross the tiles, e.g., by querying the devices analyzing tiles for dataabout a given user and attribute and summing (or otherwise aggregating)the responses. Again, this technique is expected to offer relativelyfast concurrent operation and reduce calls to data stores that mightotherwise slow the analysis.

The user-profiles resulting from the above describe processes andsystems may be used in a variety of contexts. For example, as notedabove, advertisements may be selected based on the user profiles. Inanother example, the user profiles may be used to do market research,for example, by identifying which attributes score relatively high ateach of a business's locations at certain times to characterize thecustomers of that business. In another example, the user profiles may beused to customize the look and feel or operation of software operated bythe user, for instance configuring application differently for a userknown to have children relative to the look, feel, or operationpresented to a user who has attribute scores that indicate that userlikely does not have children.

Thus, the above-describes processes may yield user profiles in anautomated fashion, at relatively low expense, and in a privacy friendlymanner. Associating attributes of geographic locations visited by theuser to the user's profile, and accounting for the time of day at thegeographic location, and for the user, are expected to yield relativelyaccurate user profiles that account for the different ways people behaveduring different times of the day. Further, inferring intermediatelocations is expected to yield a relatively high resolutioncharacterization of users, and determining whether the user is at anadjacent location, whether adjacent locations have consistentattributes, and whether the attributes of a given location are outliersfor the user are expected to further improve the quality of the userprofile. It should be noted, however, that not all embodiments providethese benefits.

As noted above, in some cases, the user profiler 12 of FIG. 1 generatesuser profiles based on the location histories of mobile devices 16. Asmentioned, in some use cases, the location histories are not acquireddirectly from the mobile devices 16 and are instead obtained from thirdparties, such as cell phone carriers, advertising networks, operators ofnative mobile device applications, and other entities with access tolocation histories from a relatively large number of users, such as tensof thousands, hundreds of thousands, or millions of users. Theselocation histories may be provided in a location data set includinglocation histories from a relatively large number of users collectedover some duration of time, such as over the preceding hour, day, week,month, or year. In some cases, a batch process is used to generate userprofiles in response to receipt of a new location data set.

As discussed above, often different providers of location data sets usedifferent identifiers for a given user. Often, the location data setidentifies users uniquely within a given data provider's system, butdoes not identify users canonically, as each data provider often has adifferent unique ID for the same user, thereby impeding efforts to matcha record (e.g., a location history) about a user from one data providerwith a record about the same user from another data provider. (The useof non-canonical identifiers is referred to as anonymization herein, butthe intent behind the use of these inconsistent identifiers need not beto anonymize for the present techniques to apply.) Also as noted above,when users update their equipment, e.g., with a new cell phone, it canbe difficult to tie their existing profile to data from the newequipment, as the user identifiers are often based on identifiers of themobile device. Consequently, it can be difficult to use location datasets from third parties to generate user profiles.

FIG. 4 shows an example of a computing environment 70 configured toupdate user profiles based on anonymized data from multiple third-partydata providers. In some cases, the computing environment 70 includes thecomponents of the computing environment 10 of FIG. 1, as the techniquesdescribed with reference to the computing environment 70 may yieldricher location histories of users that are used by the user profiler 12to generate user profiles. As illustrated, in addition to the featuresshared with the computing environment of FIG. 1, the computingenvironment 70 of FIG. 4 includes an anonymized-user-identifier matcher72 and a plurality of third-party data providers 74. Components 72 and74 may be constructed from one or more properly programmed computersystems, like those described below with reference to FIG. 9, based onthe operations described below.

Reference numbers shared between FIG. 4 and FIG. 1 refer to thecomponents and features described above. That said, the techniquesdescribed below are not limited to systems having the features describedabove and are independently useful for other purposes, such as profilinggeographic areas rather than profiling users (which is not to imply thatthe system above is limited to profiling users).

In some cases, the computing environment 70 of FIG. 4 executes some orall of a process described below with reference to FIG. 5 to match useridentifiers across location data sets from different third-party dataproviders and generate (e.g., create, update, or otherwise augment) userprofiles. As explained below, in some embodiments, user identifiers arematched between data providers based on similarities between time andgeolocation data associated with the user identifiers in data fromdifferent data providers.

For instance, cellular records in a location data set may indicate thatanonymized user ID “12Dfs93sadkg38” (e.g., a number based on a hash ofthe User's UDID or phone MAC address) is within a first 100-meter squaregeographic area for 80% of workday hours (e.g., in their workplace) andwithin a second 100-meter square area for 60% of weekend hours (e.g., intheir home). And data from an ad network may indicate that anotheranonymized user ID “19371349839849302355” is associated with an accountthat viewed ads in the same first 100-meter tile during workday hoursand in the same second 100-meter tile during weekend hours. Based onthese similarities, the embodiments may determine that anonymized userID “12Dfs93sadkg38” from the cellular records corresponds to the sameuser as anonymized user ID “19371349839849302355” from the ad network.Embodiments may also calculate a confidence value for the correlationbased on the amount of similarity and the sample size. Based on thiscorrelation, embodiments may generate (e.g., create, update, orotherwise augment) a user profile of the user based on data from boththe cellular network and the ad network. The generated user profile maybe used by advertisers to target more relevant ads to the user.

In many systems, however, the matches may be more complex than matchingsingle tiles at one or two times of day. Often location histories for agiven user span several point clouds of locations and times (e.g., eachcorresponding to a cluster near their work, near their home, and nearbusinesses or roads they frequent), with each geolocation beingassociated with varying degrees of geographic resolution and confidencesin the determination of location. Further, the location data sets areoften relatively large, covering millions of users each with such apoint could developed over days or weeks, such that processing the datacan take a relatively long time to identify matches absent techniques toexpedite the operations, such as appropriate use of concurrency and datastructures configured to expedite data retrieval. Embodiments describedbelow may accommodate this added complexity (though the presenttechniques are not limited to the more complex use cases).

The anonymized-user-identifier matcher 72 may execute certain steps of aprocess shown in FIG. 5 for matching user identifiers across locationdata sets, using the matches to generate user profiles, and using theuser profiles to select advertisements to present to the profiled users.Like the other processes described herein, embodiments are not limitedto the order of the steps depicted, nor to systems that perform all ofthe illustrated steps, as data at intermediate stages is independentlyuseful.

For example, more generally, embodiments may obtain a plurality oflocation data sets, each location data set being from a different thirdparty data provider and having a different user identifier for a givenuser (e.g., an identifier based on the user's equipment or account).Each location data set may include a plurality of user-activity records.Each user-activity record describe activity (including user locationhistory—like a sequence of time-stamped geolocations—and, in some cases,additional context, like information about the user's device orinformation about a computing session) of a single user on therespective third party's system. Some user-activity records include onlya list of user-identifiers and associated location histories, and someuser-activity records include additional context. Additionally, eachuser-activity record has a user identifier that is unique among otherusers of that system (e.g., a hash of the user's MAC address or phonenumber), but does not serve to explicitly identify (e.g., with a stringtext match) the user in records from other third party systems, e.g.,because those other systems use a different hash algorithm or differentinput data for forming an anonymized user ID.

Some embodiments of the process 80 include obtaining a plurality oflocation data sets from different providers of user geolocation history,as shown in block 82. As noted above, in some cases, geolocations areobtained by the third-party data providers 74 based on wireless signalssent by or received from the mobile user devices 16. Examples of suchwireless signals include wireless signals from satellite positioningsystems sensed by a global positioning system location sensor on themobile user devices 16. Other examples include cell tower triangulationof the mobile user device, for instance, performed by, or at the behestof, operators of cellular networks, or wirelessly transmitted locationssent from the mobile user devices to the cellular networks.

In some cases, a native mobile application (e.g., as downloaded from apre-approved collection of such applications on a platform hosted by anentity providing an operating system for the mobile device) on themobile user devices 16 queries the operating system of the device toobtain a geographic location of the device. For example, some examplesof native applications executing the iOS™ operating system mayinstantiate a member of the CLLocationManager class provided within thatoperating system and use methods of the class to obtain the geolocationof the mobile user device. In another example, some examples of nativeapplications executing on the Android™ operating system may instantiatea member of the LocationProvider class provided by the operating systemand use methods of the class to obtain the geolocation of the mobileuser device. In some cases, third parties may distribute nativeapplications on multiple operating systems, and different mobile userdevices having the different operating systems may report backgeolocation for inclusion in a single location data set.

Each location data set may include location histories of a relativelylarge number of users, e.g., more than ten thousand or more than onemillion. Within a location data set, the location histories may spansome duration of time, for instance, the previous hour, day, week, ormonth (though the trailing duration need not extend to the currenttime). Each location history may include a plurality of time-stampedgeolocations documenting the geographic location of a mobile user deviceover time (e.g., at times a native application is interacted with by auser or at times an advertising network serves an ad to the user, e.g.,in a mobile web browser). In some cases, the location histories for agiven user include more than ten, more than a hundred, or more than athousand time-stamped geolocations, depending on the fidelity of thedata. In some cases, the geolocations are expressed with coordinates,such as latitude and longitude, with varying levels of significantdigits (or other measures of granularity) among the data providers. Orin some cases, for some providers, the geolocations are expressed withreference to some other geographic area, such as zip codes, oradvertising designated marketing area (DMA). In some cases, eachgeographic location is associated with a geolocation confidence scoreindicative of the reliability of the measured location, e.g., 80%confidence radius based on the quality of GPS signals received, or apercentage indicative of the confidence that the device is within somepre-determined radius, such as within 100 meters of the reportedgeolocation. Different data providers may provide location data setswith different permutations of the preceding attributes of locationhistories.

For each location data set, embodiments may generate (e.g., extract,validate, and normalize) a plurality of user-location records from thedata set, the user-location records being normalized location histories,e.g., from user activity records in the location data sets. Eachuser-location record may include the third-party user identifier for therespective location data set and a set of time-stamped geolocations ofthe user (e.g., latitude and longitude coordinates, or identifiers ofgeographic areas, such as identifiers of tiles). In some cases, thetime-stamps describe a duration of time during which the user was in thearea.

In some cases, the user activity records are normalized so thatlocations and times from different third-party data providers areexpressed in the same format, for instance, by converting expressions oftime using one set of units into expressions of time using a common setof units (such as the units of the tile times discussed above) and byconverting expressions of geolocation using one set of units (such aslatitude and longitude or zip code) into expressions of geolocationusing a common set of units, such as the above-described tiles.

Normalizing times to the above-described times of the time-tile recordsmay make the matching process relatively sensitive to periodic behaviorof users, as the time-tile records in some embodiments correspond toactivities in a work-week cycle. This is expected to improve accuracyrelative to non-periodic detection techniques for users who generallyfollow a weekly schedule, such as weeks in which they spend more time onthe weekend at home and more time during the workweek in the office.Other periodic cycles may be used, e.g., a daily periodic cycle. Or,other embodiments may normalize time to a non-periodic value, such as anumber of minutes or hours since an arbitrary date to account fornon-periodic behavior of users.

In some cases, some location data sets may use relatively low-resolutionexpressions of geolocation, such as a zip code, or a latitude andlongitude with relatively few significant digits. In some cases, asingle geolocation from a third party location data set may correspondto multiple tiles. In some cases, normalization may include convertingthe low-resolution geolocation to a center tile among the correspondingmultiple tiles (for instance, selecting a centroid tile in a collectionof tiles covering a zip code). In other cases, a single low-resolutiongeolocation from a third party data provider may be normalized bymapping that geolocation to a plurality of tiles covering thecorresponding low-resolution geolocation (such as all of the tilescovering a zip code). In this example, the plurality of tilescorresponding to a single low-resolution input geolocation may beassigned a weight, such as the reciprocal of the number of tiles towhich the single low-resolution input geolocation is mapped, and thatweight may be accounted for when matching clusters from differentlocation data sets. For instance, a center of gravity that accounts forthe weight of the vectors may be calculated for each cluster, andclusters may be compared based on differences in distance between thecenters of gravity.

Additionally, in some cases, confidence scores for geolocations inlocation data sets may be normalized with a variety of techniques. Insome implementations, the confidence scores are ignored, e.g., whenlarge sample sizes are sufficient to overcome noise. In another example,weights may be assigned to time-location vectors based on (e.g., as aninverse function of) the confidence scores, for instance, assigning aweight of 0.8 to a vector based on an 80% confidence score. Such weightsmay be accounted for when matching clusters from different location datasets, e.g., using the technique described above.

Some embodiments of process 80 may include matching user identifiersbetween the location data sets based on geolocations of thecorresponding user and times that the corresponding user was at thegeolocations. For each user-location record of a given data set,embodiments may search other data sets for corresponding user-locationrecords. For instance, for the respective user-location record of thegiven data set, embodiments may calculate time-location vectors for eachlocation-time-stamp pair in each user-location record (e.g., withscalars of latitude (or a tile-count equivalent), longitude (or atile-count equivalent), and time-period elapsed since a fixed reference,or time of the week to capture periodic behavior associated with thework week).

To prepare to match user identifiers across location data sets, someembodiments may then identify clusters among the vectors using, forinstance, various centroid-based clustering algorithms (e.g., k-means),density-based clustering algorithms (e.g., DBSCAN), ordistribution-based clustering algorithms (e.g., a Gausian distributionmodel). As a result, each location data set may include a plurality ofcluster records, each cluster record corresponding to a user identifierin the at location data set, and each cluster record including one ormore clusters of time-location vectors based on the location historiesof that user. Some embodiments may then perform this vectorization andclustering process for each user-location record in the other data sets,yielding additional sets of clusters, each set being associated with auser identifier of the respective location data set.

To expedite operations, some embodiments may cluster location historiesfor multiple user identifiers concurrently. For instance, a plurality ofprocesses executing on a plurality of computers may concurrently performa process of 1) requesting a location history for a given useridentifier; 2) clustering the location history; 3) returning a set ofclusters for the location history of the given user identifier; 4)repeating. In some cases, the dimensions of the clusters may be reducedas part of forming the clusters to reduce memory access time ofprocesses and data transfer times between processes. For instance, acluster may be represented as a center point (e.g., a centroid or centerof mass for weighted vectors) and a radius. Or a cluster may berepresented in output of the clustering process by a bounding volume,such as a convex hull of the points in the cluster.

These sets of clusters for a given user identifier may be compared tosets of clusters for user identifiers in other location data sets tomatch user identifiers across location data sets. To this end,embodiments may perform a matching process for each user identifier in agiven location data set. The matching process for a given useridentifier may include: 1) obtaining the cluster record for the givenuser identifier in the location data set (which may be referred to ashaving the reference cluster set); 2) obtaining all cluster records foranother location data set; and 3) determining which cluster record inthe other location data set has a cluster set (called a comparisoncluster set) most similar to that of the reference cluster set. The useridentifier of the most similar comparison cluster set may be deemed amatch to the user identifier of the reference cluster set.

Thus, to match user identifiers, in each of the other data sets, theuser-location record with the most similar comparison clusters to thereference clusters may be deemed a matching user-location record. Insome cases, a similarity score is calculated for each pair of referencecluster and comparison clusters, and the similarity score must exceed athreshold to constitute a match. When multiple comparison clustersexceed the threshold, the most similar comparison cluster may be deemeda match, or the most similar cluster may be deemed a match withoutregard to a threshold.

Similarity scores may be calculated with a variety of techniques,including based on a Euclidean distance between centroids of theclusters being compared (or based on Euclidean distance of centers ofmass for weighted vectors) or based on a volume of overlap of convexhulls of the clusters. For instance, a root mean square of the Euclideandistances between each cluster in a reference cluster set to a closestcluster in a comparison cluster set may be calculated as a similarityscore. Or, in another example, the similarity score may be thepercentage of the overlapping convex hull volume to the sum of volumesof the convex hulls of clusters in a reference cluster set and acomparison cluster set.

Because matching may be a time-intensive process for data sets with alarge number of user identifiers, some embodiments may expediteprocessing with various techniques. For instance, some embodiments mayexecute concurrent matching processes in which each of a plurality ofdifferent processes (e.g., more than ten, one hundred, or one thousand)is assigned a different reference cluster set to which a match is to befound. To expedite matching, some embodiments may group or sort andstore the comparison cluster sets according to location (or time), sothat unviable comparison cluster sets may be quickly discarded andlikely matches may be quickly identified. For instance, the comparisoncluster sets may be grouped according to the US state most heavilyrepresented among their constituent clusters and the reference clusterset may be similarly grouped. The reference clusters set may then onlybe compared to comparison cluster sets in the same or adjacent US states(or other arbitrary geographic areas of varying size may be used forgrouping). Or the comparison cluster sets may be pre-sorted. Forinstance, a centroid of the comparison cluster sets may be determined,and comparison cluster sets may be sorted according to location of thecentroid on a space filling curve, like a z-curve. Embodiments may thensearch only within a threshold distance on the space filling curve formatches.

Embodiments may proceed through each of the other data sets, searchingthe matching user-location records in the remaining data sets for eachuser-location record in the other data set that has not yet beenmatched. Thus, embodiments may start, for example, with cellular carrierdata from which reference clusters are formed and matched to other datasets and then proceed through those other data sets, e.g., a data setfrom an ad network, and form reference clusters for, for example, the adnetwork to be matched.

In some cases, information beyond time and location may be used formatching. For instance, in some embodiments, each user-activity recordincludes additional information about the mobile user devices, such asan operating system type, a device manufacturer identifier, a componentmanufacturer or version identifier, a software maker or versionidentifier, or the like. Some embodiments may use this additionalinformation to match user identifiers across data sets from differentthird-party data providers. For example, candidate matches may befiltered based on whether this additional information is consistentacross user activity records from two different location data sets. Acluster set from location data set A for a given user identifier mayalign relatively closely in time-location vector space with the clusterset from location data set B associated with another, potentiallycorresponding user identifier, and upon detecting this potential match,some embodiments may then determine whether any additional informationin the data set A for the potentially corresponding user is inconsistentwith that of data set B, determining, for instance, to reject the matchin response to determining that the operating system is different, thecomponent manufacturers different, or the like. Upon rejecting a match,embodiments may then evaluate the next closest match in response.

In other cases, the same user may use different mobile user devices tointeract with systems of different third-party data providers, forexample, using a tablet for certain third-party data providers servicesand using a cell phone for other third-party data provider services. Toaccount for this use case, some embodiments may ignore differences inthe other aspects of user activity records, such as different devicemanufacturers, and match user identifiers based on activity occurring ondifferent devices with patterns that overlap or a similar in space andtime.

In some cases, the preceding steps of process 80 may be performed bycomponent 72 of FIG. 5, while the subsequent steps may be performed bythe components described with reference to FIG. 1. Next, in the process80, some embodiments store the matched user identifiers in associationwith one another in corresponding user profiles, e.g., in theabove-described profiles, as shown in block 86. The user-identifiers ofthe matching user-location records may be stored in association with oneanother in memory, e.g., in a canonical user-identifier record, havingthe matching user-identifiers from the different data sets and acanonical user identifier associated with a user profile of the user. Insome cases, each link to a user identifier is also associated with aconfidence score based on (e.g., equal to) the similarity score of theclusters. Examples of the user profiles are described in U.S. patentapplication Ser. No. 13/734,674, filed Jan. 4, 2013, which is herebyincorporated by reference in its entirety for all purposes.

Some embodiments of process 80 then select a user-activity recordshaving matched user identifiers and augment a user profile of acorresponding user based on information in the selected user-activityrecords, as shown in blocks 88 and 90. Finally, some embodiments mayselect an advertisement based on the augmented user profile, as shown inblock 92.

Embodiments are not limited to matching across data sets from differentthird party location data set providers. In some cases matching may beperformed within location data sets from a single provider, or from asingle provider to an existing user profile.

In some cases, user-location records within a single data set arematched to detect that a user has begun using a new computing device.For example, a user may acquire a new cell phone, and a carrier maycalculate a different user identifier hash value based on a differentMAC address of the new phone. As a result, the new user-activity recordsfor the user will generally not tie to the older user-activity recordsfor the same user, due to different resulting user identifierscalculated based on different attributes of the user's computing device.Embodiments may mitigate this problem by matching the new useridentifier to the user identifier from the same data providercorresponding to the user's older cell phone, or other user computerdevice. The correspondence may be stored in memory, and records from thedata provider relating to either the older or the newer user identifiermay be used to augment the same user profile. Consequently, in somecases, an existing user profile may continue to be used even when theuser switches to new computing equipment (or a new account, or otheraspect from which an anonymized user identifier is determined by thedata provider).

Further, in some cases, user-location records in a user profile arematched to the given data set, rather than (or in addition to) matchingdirectly between data sets. For example, user profiles may be associatedwith a user-location record that is augmented with additionaltime-stamped user locations as new data sets are matched to that profileand as additional data is acquired from the data providers.

Some embodiments refine or serve as alternatives to some of theabove-described techniques for reconciling different computing deviceidentifiers (also called user identifiers in some cases) used bydifferent sources of network activity data indicating geolocations ofthose computing devices, depending upon the implementation. In somecases, maintaining a single, canonical device identifier in ageolocation analytics system may give rise to certain challenges.

For example, in some cases these identifiers are matched to anexternal-namespace device identifiers in a batch process, oftenincluding several months or years of trailing data. In some cases, thesebatch processes are relatively time-consuming and, as a result, are runrelatively infrequently. This can cause delays between when the truemapping between external and internal-namespace device identifierschanges and when that change is reflected within the location analyticssystem. It can be expensive, slow, and in some cases computationallyinfeasible to reprocess all existing logs, databases, and relationshipsthere and reflect a change in internal-namespace device identifiers.(None of which is to suggest that batch processes are disclaimed orinconsistent with the present techniques.)

Another consequence of some versions of this approach is that problemscan arise when internal-namespace device identifiers are changed,potentially breaking mappings in existing data structures constructedover months and potentially years. Often historical data is logged inthe geolocation analytics system in association with internal-namespacedevice identifiers, and often other data is enriched (e.g., profiles ofpeople or places) in association with those identifiers.Internal-namespace device identifiers may change for a variety ofreasons. For example, a naming schema may be changed to accommodate alarger population or increase the semantic content of internal-namespacedevice identifiers. Or better information may be obtained about theappropriate mapping between external and internal namespace identifiers,and an internal-namespace device identifier previously considered asdistinct or even unknown, may become mapped to an external-namespacedevice identifier on receipt and analysis of information of therelationship. Such changes can break backward compatibility with olderdata. In some cases, the appropriate mappings between externalnamespaces and internal namespace is may evolve over time, asinformation is gathered, but in the interim, the extant mappings may beused to create various data structures encoding relativelycomputationally expensive analyses that are difficult to reuse in theabsence of the mappings present at the time the analyses were performed.

Complicating these changes, in some cases, information only pertains toa subset of the external-namespace device identifiers by which a givendevice (or what is perceived to be a given device based on limited data)is identified. For example, records may be obtained to indicate a givendevice (that is a computing device, for instance a mobile computingdevice) is likely designated by one source of network data withidentifier X and it is likely that that same given device is designatedby another source of network data with identifier Y. Later, the data maybe obtained that suggests identifier X is actually more likely pertainsto a different internal-namespace device identifier, but that data maynot apply to the mapping from device identifier Y.

Finally, each of these issues is further complicated by the scale ofdata in commercial use cases and the expected responsiveness of systemsthat are commercially deployed. Many more computationally naïveapproaches to addressing the above problems, while simpler to implement,do not scale adequately, for instance leading to algorithms that doublein computational complexity, memory consumption, or runtime with eachadditional device identifier or network transaction. In many cases, thenumber of computing devices being analyzed exceeds 100,000, and in manycases more than 1 million, or more than 100 million, and the number ofnetwork activity records indicating geolocated transactions may exceed 1million, and in many commercially relevant use cases exceed 1 billion,for instance, within a given month. New data points may be created at arate exceeding 500 to 2000 per second. At the same time, those usinggeolocation analytics systems expect relatively current data andanalyses from those systems. Existing techniques for configuringcomputer systems are not suitable to the task.

Some embodiments may mitigate these and other issues, or various subsetsthereof, with the technique described below with reference to FIGS. 6and 7. In some cases, an associative data structure may be created foreach external namespace (e.g., for operating system specific advertisingdevice identifiers, for hashed anonymized device identifiers generatedby other data sources, and the like). In some cases, a givenexternal-namespace device identifier may be used to calculate an indexthat points to a record in one of these data structures, and that recordmay store a plurality of different internal-namespace deviceidentifiers. This plurality of different internal-namespace deviceidentifiers may evolve over time (gaining and losing members) andprovide mappings that persists even when a new internal-namespace deviceidentifier is mapped to a given external-namespace device identifier. Insome cases, data associated with these internal-namespace deviceidentifiers may indicate reliability of these device identifiers andrelationships there between, for instance in a versioning graphindicating a lineage of internal-namespace device identifiers.

With these records, older analyses and logs may remain useful even whenmappings change, as one of the plurality of different device identifiersmay serve as a link back into the older records while still permittingthe mappings between external and internal-namespace device identifiersto evolve. Further, the data structures may provide relatively lowlatency access to these internal-namespace device identifiers, with theaccess technique, with relatively high granularity, supporting differentmappings for different external namespaces. The flexibility afforded bythese techniques may support real-time updates to the mappings,eventually yielding more accurate and up-to-date analyses from ageolocation analytics system.

Some embodiments include the process 100 of FIG. 6 may create these datastructures and update a multi-namespace mapping stored in these datastructures. In some cases, the process 100 may operate on anasynchronous set of streams of data, for example, streams of networkactivity data including the information in the above-described locationdata sets. Or in some cases, the process 100 may operate as a batchprocess, for example, executed nightly, weekly, or monthly, on batchesof logged streams of network activity data. In some cases, the process100 may be executed in multiple instances, concurrently, in some casesasynchronously, to parallelize operations on relatively large amounts ofdata. Or in some cases, a single instance may operate on the data. Theprocess 100, and the other processing functionality described herein,may be implemented with machine-readable code stored on a tangible,non-transitory, machine-readable medium, such that when instructions inthe code are read and executed by one or more processors, the operationsand functionality described are effectuated. In some cases, the mediumis distributed among a plurality of different computing devices, in adistributed application, and different subsets of the instructions arestored on different portions of the media and executed by different onesof the processors (a scenario included in references to “a medium”).

In some embodiments, the process 100 begins with accessing sources ofnetwork activity log data having different external-namespace deviceidentifiers, as indicated by block 102. There may be a variety ofdifferent sources of network activity log data. Examples include datareported by entities providing native applications on mobile computingdevices and having access to the locations of those computing devicesreported via network activity. Other examples include entities providingnetwork access, such as cellular service providers and entitiesoperating networks of wireless local network access points. Otherexamples include entities serving content to mobile computing devices,like advertising networks responding to requests for advertisements toinsert into webpages or native applications, in some cases with thoserequests including a geolocation of the computing device to inform theselection. In some cases, the source may provide an applicationprogramming interface exposing a feed of the network activity log data,or the source may provide a batch of such data uploaded periodically.The data need not be pulled from a log, rather the term “log” indicatesthe data is of the sort that is logged, for example in server logs. Thenumber of sources is expected to be relatively large and diverse, forexample, some embodiments may receive data from more than 6 or 12different sources, and in some cases more than 50.

In some cases, the different sources may provide this data in adifferent format from one another. In some cases, computing devices maybe identified in the network activity log data according to anidentifier assigned by a provider of an operating system of thecomputing device, like an identifier for advertising (IDFA) assigned bythe iOS™ operating system or an Android™ advertising ID. In other cases,the provider of the data may anonymize the device identity byidentifying the device with a cryptographic hash of various attributesof the device, like a hash of the device's MAC address and processorversion.

In some cases, these external-namespace device identifiers may change,for instance, a user may request to change their device identifier, orany of these entities may periodically cycle device identifiers, forexample by adding a random value that changes periodically to the inputof the hash function. In some cases, these changes may occur every fewmonths for a subset of the devices. In some cases, a given data sourceprovides multiple device identifiers, and in some cases, the sameexternal namespace is used by a subset of the data sources, forinstance, 3 of 12 data sources may use the device identifier provided bythe operating system, while others may use a bespoke cryptographic hashfunction to calculate an external-namespace device identifier.

The namespace of these external device identifiers is the set of validdevice identifiers for that respective namespace and the rules by whichidentifiers are assigned. For example, a relatively simple namespace mayassign device identifiers according to a four digit code that countsfrom 0000 to 9999 with each new device added to the system. In practice,the design of these namespaces is much more complex and includes designconsiderations like managing privacy, accommodating hundreds of millionsof devices with unique respective identifiers, efficiency of encoding ofidentifiers for network transmission and storage, and in some casesimparting semantic content to the identifier, like with a prefix thatindicates an attribute of the device, such as its operating system.Different data providers often arrive at different namespaces giventheir differing goals and design choices, and in some cases differentdata providers often intentionally choose different name spaces toencourage more concentrated usage of their system.

In some cases, the network activity log data may include a plurality ofrecords, for example a stream of records, with each record indicating aninstance of network activity by a given computing device. In some cases,the record may include an external-namespace device identifier, atimestamp, a geolocation sensed by the computing device (or otherwiseobtained), and an indication of the information exchanged over thenetwork or other aspects of user behavior. As noted, these records maystream at a relatively high rate, for example, at a rate higher than 500to 2000 per second, in some cases for each source of network activitylog data, and batch processes may operate on sources of network activitylog data including more than 1 billion records spanning more than onemonth of trailing duration of time.

Next, from the accessed sources of network activity log data, someembodiments may obtain a network transaction record having anexternal-namespace device identifier, as indicated by block 104. In somecases, these obtained records may be obtained as part of theabove-described feeds, for example, asynchronously from a plurality ofdifferent data sources.

Next, some embodiments may select an external-namespace-specific mappingbased on the external namespace of the obtained record, as indicated byblock 106. In some cases, the selection may include identifying a sourceof the network activity log data containing the obtained record, andselecting the external-namespace-specific mapping from among a pluralityof such mappings based on the identity of the source. Examples of suchmappings are described below with reference to FIG. 7. In some cases,each source or each external namespace may have a correspondingexternal-namespace-specific mapping. In some cases, these mappings mayassociate external-namespace identifiers within a specific externalnamespace with internal-namespace device identifiers. In some cases, theinternal namespace may be a namespace used by the geolocation analyticssystem to distinguish computing devices from one another and to identifydata related to the same computing device. In some cases, theexternal-namespace-specific mappings may be configured to providerelatively low latency access to internal-namespace device identifiersmapped to a given external-namespace device identifier. For example,some embodiments may implement a variant of the hash tables, binarytrees, or prefix trees described below. That said, not all embodimentsprovide these benefits, as various independently useful techniques aredescribed herein.

In some cases, the selection may occur before obtaining the networktransaction record. For example, some embodiments may establish aconnection to the external-namespace-specific mapping before accessingthe sources of network activity log data that correspond to thatmapping.

The mappings may associate individual external-namespace deviceidentifiers with one or more internal-namespace device identifiers. Insome cases, a given internal-namespace device identifier may appear inmultiple external-namespace-specific mappings, as that computing devicemay have different identifiers used by the different sources of networkactivity log data. Further, because information about the correctmappings may change over time, and that information may only pertain toa subset of the external-namespace device identifiers, in some cases,different external-namespace device identifiers may be mapped todifferent, but partially overlapping sets of internal-namespace deviceidentifiers in the different external-namespace-specific mappings.

Next, some embodiments may determine a hash value based on theexternal-namespace device identifier, as indicated by block 108. Hashfunctions generally generate a fixed length output from an input, whereeach part of the input contributes to the output, even if the input issubstantially longer than the input, e.g., like in an MD5 hash orSHA-256 hash.

In some cases, the hash value may be the output of a hash functionconfigured to output values in a range of an index of theexternal-namespace specific mapping that was selected. For example, theexternal-namespace-specific mapping may include an associative arrayaccessed by index values they correspond to hash function outputs. Thearray may include, for example, reserved memory for one or moreinternal-namespace device identifiers at sequential index values of thearray, for instance, ranging from an array index of zero up to an arrayindex of 10 million. Individual values of the array may be accessed byrequesting a value at an index position of the array, for example, arequest for the value at index position 100,256 of the array may returna set of five internal-namespace device identifiers mapped to acorresponding external-namespace device identifier. Thus, the output ofthe hash function may serve as a key, in the form of an index of thearray, that is paired with a value, in the form of a collection ofinternal-namespace device identifiers mapped to the index, the key, andthe external-namespace device identifier that hashes to the index/key. Avariety of different hash functions may be used, for example thresholdamount of less significant digits serving as the hash output.

One advantage of this technique is that access times are relatively lowfor relatively large collections of data. Rather than iterating througheach of the entries in the array to identify an entry corresponding to aan external-namespace device identifier obtained in the networktransaction record (which in the above example may include as many as 10million iterations, each having associated memory access requests andresponses), some embodiments may perform a single iteration thatdirectly accesses the desired collection of internal-namespace deviceidentifiers (e.g., access in the form of reading or writing).

In some cases, other data structures faster than iterated searches maybe used to expedite access. In some cases, keyvalue pairs of theexternal-namespace device identifiers and one or more internal-namespacedevice identifiers may be arranged in a sorted list, sorted by theexternal-namespace device identifiers, and records may be accessed witha binary search. In another example, the internal-namespace deviceidentifiers may be arranged and accessed through a tree data structurein which different portions of the tree correspond to different portionsof the external-namespace device identifier, like in a prefix tree or ina binary tree, like a balanced binary tree. For instance, a mostsignificant digit of the external-namespace device identifier encoded inbinary format may correspond to a root of the tree, and depending uponwhether that value is one or zero, the tree may branch to another set ofnotes that correspond to the second most significant digit, which maythen branch according to the third most significant digit, with leafnodes corresponding to mappings to sets of internal-namespace deviceidentifiers. Some embodiments may perform a recursive traversal of thetree to access a record of internal-namespace device identifiers.

Next, some embodiments include accessing a graph of internal-namespacedevice identifiers at an index of an associative array equal to the hashvalue, as indicated by block 110. In some cases, the plurality ofinternal-namespace device identifiers are arranged with metadataindicating information about those internal-namespace deviceidentifiers, and in some cases relationships therebetween. In somecases, those relationships are in the form of a graph having nodescorresponding to internal namespace to this identifiers and edgesindicating versioning relationships between those identifiers, forinstance, indicating that one device identifiers supersedes another orwas later determined to be a more reliable or more convenient indicatorof the information indicated by the previous one. In some cases, theserelationships may branch, for example, indicating that twointernal-namespace device identifiers likely correspond to a singleolder one. In some cases, these relationships may merge, for example,indicating that two internal-namespace device identifiers were laterconsolidated. In some cases, the relationships do not indicateversioning information (which is not to suggest that any other featuredescribed herein is not also amenable to variation), but rather merelyindicate that the internal-namespace device identifiers are related tothe same external-namespace device identifier.

In some cases, these relationships may be accessed when determiningwhich internal-namespace device identifier to use. For instance, a querymay be received requesting data pertaining to a particular duration oftime and current internal-namespace device identifier. Embodiments maycrawl backward through the graph of device identifiers to identifyrecords relating to older versions, in some cases, selecting thoserecords associated with an older internal-namespace device identifier inuse at the particular duration of time.

It should be noted that data structures referenced herein need not belabeled with the same name in program code to serve as an instance ofthose data structures. For example, graphs may be encoded in a varietyof different formats, including data structures explicitly labeled asgraphs, as well as in matrices, relational databases, hierarchicalserialized data format documents, and the like. Similarly, hash tablesneed not be explicitly labeled in program code is a hash table, if theunderlying functionality of a hash table is present, and the same istrue of the above-described trees and sorted lists.

Next, some embodiments may include determining whether the graph ofinternal-namespace device identifier satisfies filter criteria, asindicated by block 112. In some cases, the filter criteria may becriteria indicative of whether the collection of internal-namespacedevice identifiers are likely valid mappings to the correspondingexternal-namespace device identifier. For example, filter criteria mayinclude a threshold count of internal-namespace device identifiers, witha count of internal-namespace device identifiers exceeding the thresholdbeing deemed unreliable, and indicating that the mapping should befiltered from results. For example, a network activity data provider mayhave a default external-namespace device identifier applied to otherwiseunidentified computing devices, and that default external-namespacedevice identifier may have several hundred or several thousandinternal-namespace device identifiers mapped to thereto in an unreliablemapping.

In another example, the filter criteria may compare mappings in multipleexternal-namespace-specific mappings to detect inconsistent mappings.For example, a given internal-namespace device identifier may be mappedto an external-namespace device identifier corresponding to a firstoperating system in a first external-namespace-specific mapping, andthat same internal-namespace device identifier may be mapped to anotherexternal-namespace device identifier corresponding to a second,different operating system in a second, differentexternal-namespace-specific mapping. The likelihood of a computingdevice changing operating systems is often relatively low, so one orboth of these mappings may be filtered from results of an access requestas unreliable.

In another example, internal-namespace device identifiers may beassociated with a reliability score indicating a likelihood that thecorresponding mapping is correct. In some cases, this reliability scoremay be based on the strength of matches described above with referenceto block 84 of FIG. 5, e.g., a half-life aged amount of corroboratingrecords confirming the mapping. Those mappings having less than athreshold reliability score may be filtered from access requests (e.g.,omitted).

Next, some embodiments may update one or more profiles associated withthe internal-namespace device identifiers that were accessed in block110 and not filtered from results in block 112, as indicated by block114. In some cases, updating the profiles may include updating a profileof a place or of a person (which may be a profile of a person havingmultiple devices associated thereto, or maybe a profile of a device,with one person potentially having multiple device profiles). In somecases, the updating may include updating one or more of theabove-described attribute scores based on a geolocation in the obtainednetwork transaction record in block 104, as described elsewhere herein.In some cases, the profile may be identified by, and selected accordingto, for example uniquely, an internal-namespace device identifier. Thus,in some cases, multiple profiles may be updated corresponding tomultiple internal-namespace device identifiers that were accessed andpassed the filters.

In some cases, blocks 106 through 112 may serve other objectives beyondenriching profiles. For example, the same operations may be performed inresponding to a query to read values based on a suppliedexternal-namespace device identifier. For instance, some embodiments mayreceive a query from a party accessing the geolocation analyticsplatform to obtain information about a given mobile computing device(like when responding to an ad request from that mobile computingdevice), and that query may include an external-namespace deviceidentifier (which may not include an internal-namespace deviceidentifier, and which for at least some of the external namespaces isdifferent from the internal-namespace device identifier). Someembodiments may execute the operations of blocks 106 through 112 toobtain one or more internal-namespace device identifiers, and then withthose identifiers access one or more profiles associated thereto toprovide a response to the query, for example, including some or all ofthe profile or an analysis based on the profile in the response. In somecases, these responses may be provided relatively quickly, as many ofthese types of queries are often relatively latency sensitive, forexample, within less than 200 ms, and in some cases within less than 50ms, in order to respond to a bid opportunity for a mobile advertisementwithin the bid opportunity window. Again, the data structures describedabove that afford relatively fast access are expected to facilitatethese response times (though again, embodiments are not limited tosystems that provide these benefits, as various independently usefulinventive techniques are described).

FIG. 7 shows an example of a data structure upon which the process ofFIG. 6 may operate. FIG. 7 illustrates an example of a multi-namespacemapping 120. In some embodiments, the multi-namespace mapping 120 mayinclude a plurality of external-namespace-specific mappings 122 and 124(with two shown, but embodiments are expected to include substantiallymore, for example 12 or more). The selection of block 106 may includeselecting among these examples.

Each of the external-namespace-specific mappings 122 and 124 may includea plurality of keyvalue pairs 126. In the illustrated arrangement, thekeyvalue pairs take the form of an associative array, with index values128 serving as the keys, and array values 130 serving as the values ofthe keyvalue pairs. As discussed above, in some cases, the index values128 may be values in a range of outputs of a hash function that takes asan input a corresponding external-namespace device identifier andoutputs an index value of the array, in this example ranging from 000000 to fff fff, and hexadecimal, but commercial embodiments may be evenlarger.

As illustrated, the associative array may reserve memory in with theoperating system for each potential output of the hash function, in somecases regardless of whether any external-namespace device identifier infact yields a hash function output corresponding to that index value.Thus, some of the indices of the associative array may be empty.Similarly, in some cases, hash collisions may occur where twoexternal-namespace identifiers hash to the same value, in which case arecord may be added to the corresponding position in the associativearray indicating the collision and subdividing the internal-namespacedevice identifiers with separate mappings to the separateexternal-namespace device identifiers that collide.

In this example, the internal-namespace device identifiers arerepresented by reference number 132, and relationships therebetween arerepresented by reference number 134. As illustrated, a variety ofdifferent relationships therebetween may be represented. Further, eachof the nodes 132 may have various metadata, like the above-describedreliability scores, and in some cases timestamps indicating ages of theidentifiers since they were added to the system. In some cases, each ofthe circles corresponds to an internal-namespace device identifier (likea 10 digit value from a counter that increments each time a new devicesdetected by the geolocation analytics system, a hash of one of theexternal-namespace device identifiers, or a selection of one of theexternal-namespace device identifiers from among a plurality of suchidentifiers probabilistically determined to be likely associated with agiven device). As discussed above, a variety of other types of datastructures may be used to encode the mappings, including binary trees,prefix trees, and sorted lists.

In some cases, the process of FIG. 6 and data structure FIG. 7 may beimplemented an updated with a computing architecture for handing areal-time data feed. The computing architecture may be designed tooperate at relatively large scale and low latency. The architecture mayinclude a message broker described below, that communicates with adistributed processing system described below, to access and updatesrecords in an in-memory database described below.

In some cases, a feed of network activity log data may be received by amessage broker configured to handle real-time data feeds, for instancewith a distributed transaction log, like with Apache Kafka™. Someembodiments may executing the message processing system on a cluster ofservers. In some cases, messages may be arranged in topics of messages,with each message having a key, a value, and a timestamp. In some cases,each topic may be stored in a partitioned log that has an orderedimmutable sequence of records to which the servers append new records inthe topic, e.g., with a sequential id number within the partition. Insome cases, the partitions may be distributed among multiple servers toaccommodate larger data sets than a single server can support, and insome cases, the partitions may be replicated across servers (e.g.,partially for fully) to support fault tolerance (i.e., continuedoperation even when one server fails in the cluster). In some cases, theservers managing a partition may designate one server as a leader andothers as followers, where the leader serer may manage read and writerequests for the partition, and the followers replicate the partition.The followers may select a new leader upon failure of the existingleader to provide fault tolerance, a feature that becomes increasingimportant as computing tasks are distributed over larger collections ofcomputing devices. In some cases, downstream computing processes, calledconsumers, may be arranged in groups and may subscribe by topic, andmessages may be routed to different ones (e.g., in round robin fashion)of the members of the consumer group, thereby providing some loadbalancing for concurrent downstream processing. In some cases, theoperations of obtaining network activity records may be performed withthis architecture.

The message processing system may assign tasks to a fault tolerant,concurrent processing system, like Apache Spark™, which may accessrecords in a extensible no sequel database to update records in the datastructure FIG. 7 and access and update various user profiles withrelatively low latency. Again, the processing system may includeconcurrently operating processes, distributed over a relatively largenumber of computing devices (e.g., more than 10) with fault-tolerantdata structures. In some cases, received messages and intermediateresults may be stored in a resilient distributed dataset (RDD) thatreplicates the data across computing devices in the form of an immutablecollection of objects. Each RDD may have a plurality of partitions intowhich data pertaining to device mappings are arranged. In some cases,the RDDs having intermediate data results may be accessed in a sequenceof operations, as part of a distributed shared memory, (e.g., for thevarious filter criteria) without committing the data to disk orserializing the data, which may slow computations. In some cases, acluster manager process may coordinate operations that effectuate otherportions of the process of FIG. 6.

In some cases, the output of updates may be stored in a non-relational(i.e., non-tabular) database, like a NoSQL database configured toprovide low-latency response to queries focused on relationships betweendevices and places, for instance with an Aerospike database. In somecases, the database is an in-memory database that replicates data acrossmultiple computing devices for fault tolerance. In some cases, queriesmay be serviced by accessing the database in main memory (rather than ondisk), which often provides several orders of magnitude faster responsetimes. In some cases, persistent records in flash storage may be updatedto match the in-memory version.

In some embodiments, the above techniques may operate on the system ofFIG. 8, which illustrates a computing environment 210 having an examplegeolocation analytics platform 212. Embodiments of the geolocationanalytics platform 212 may be implemented with one or more of thecomputing devices described below with reference to FIG. 9, e.g., byprocessors executing instructions stored in the below-described memoryfor providing the functionality described herein. FIG. 8 shows afunctional block diagram of an example of the geolocation analyticsplatform 212. While the functionality is shown organized in discretefunctional blocks for purposes of explaining the software and hardwareby which the geolocation analytics platform 212 may be implemented insome embodiments, is important to note that such hardware and softwaremay be intermingled, conjoined, subdivided, replicated, or otherwisedifferently arranged relative to the illustrated functional blocks. Dueto the size of some geographic data sets (which may be as large as 100billion content requests or geolocations, or larger, in some use cases),some embodiments may include a plurality of instances of the geolocationanalytics platform 212 operating concurrently to evaluate data inparallel and some embodiments may include multiple instances ofcomputing devices instantiating multiple instances of some or all of thecomponents of the geolocation analytics platform 212, depending on costand time constraints.

The geolocation analytics platform 212 may be understood in view of theexemplary computing environment 210 in which it operates. As shown inFIG. 8, the computing environment 210 further includes a plurality ofgeographic-data providers 214, the Internet 216, a plurality of mobileuser devices 218, a plurality of user-data providers 220, a contentserver 222, a fraud detector 224, and a site selector 226. While arelatively small number of the above-described components areillustrated, it should be understood that embodiments are consistentwith, and likely to include, substantially more of each component, suchas dozens of geographic-data providers 214 and user data providers 220,hundreds of fraud detectors 224, content servers 222, and site selectors226, and millions or tens of millions of user mobile devices 218. Eachof these components may communicate with the geolocation analyticsplatform 212 or one another via the Internet 216. Some suchcommunications may be used to either provide data by which audiences areclassified according to geolocation history and other parameters, andsome embodiments may use classified audiences for various purposes, suchas serving content, detecting financial fraud, selecting real-estatesites, or the like. The components of the computing environment 210 mayconnect to one another through the Internet 216 and, in some cases, viavarious other networks, such as cellular networks, local area networks,wireless area networks, personal area networks, and the like.

FIG. 1 shows three geographic-data providers 214, but again, embodimentsare consistent with substantially more instances, for example, numberingin the hundreds of thousands. The geographic-data providers 214 areshown as network connected devices, for example, servers hostingapplication program interfaces (APIs) by which geographic data isrequested by the geolocation analytics platform 212, or in webpages fromwhich such data is retrieved or otherwise extracted. It should be noted,however, that in some cases the geographic data may be provided by othermodes of transport. For instance, hard-disk drives, optical media, flashdrives, or other memory may be shipped by physical mail and copied via alocal area network to on-board memory accessible to the geolocationanalytics platform 212. In some cases, the geographic data is acquiredin batches, for example, periodically, such as daily, weekly, monthly,or yearly, but embodiments are consistent with continuous (e.g.,real-time) data feeds as well. Thus in some cases, the geographic-dataproviders 214 may provide geolocation histories that arenon-contemporaneous (relative to when they are acquired) and span arelatively large period of time, such as several hours, several weeks,or several months in the past.

In many cases, the entity operating the geolocation analytics platform212 does not have control over the quality or accuracy of the providedgeographic data, as that data is often provided by a third-party, forinstance, sellers of geocoded advertising inventory, the data beingprovided in the form of ad request logs from various publishers. Forinstance, the geographic-data providers 214 may be mobile websitepublishers, retargeting services, and providers of mobile deviceapplications, or native apps. In some cases, the geographic datacomprehensively canvasses a large geographic region, for example, everyzip code, county, province, or state within a country, or the geographicdata may be specific to a particular area, for example, within a singleprovince or state for data gathered by local government or localbusinesses. Publishers acting as the provider of the geographic data maybe an entity with geocoded advertising inventory to sell, e.g., adimpressions up for auction (e.g., logged over time) that are associatedwith a geographic location at which the entity represents the ad will bepresented. In some cases, pricing for such advertising inventory is afunction, in part, of the quality and accuracy of the associatedgeographic locations.

In some cases, the geographic-data providers 214 may provide locationhistory data (e.g., from the mobile devices 218), such as ad requestlogs indicating, for instance, a plurality of requests foradvertisements from publishers (e.g., operators of various websites ormobile device native applications), each request being for anadvertisements to be served at a geolocation specified in the request.The geographic location specified in a given request may be used by anadvertiser to determine whether to bid on or purchase the right tosupply the requested advertisement, and the amount an advertiser wishesto pay may depend on the accuracy and quality of the identifiedgeolocation. These location history records may contain a plurality ofsuch requests, each having a geolocation (e.g., a latitude coordinateand a longitude coordinate specifying where a requested ad will beserved), a unique identifier such as a mobile device ID (e.g., a deviceidentifier of a end user device 18 upon which the ad will be shown) anda timestamp. In some cases, the device identifier may be a Unique DeviceIdentifier (UDID) or an advertiser or advertising specific identifier,such as an advertising ID.

In FIG. 8, three mobile user devices 218 are illustrated, but it shouldbe understood that embodiments are consistent with (and most use casesentail) substantially more user devices, e.g., more than 100,000 or morethan one million user devices. The illustrated user devices 218 may bemobile handheld user devices, such as smart phones, tablets, or thelike, having a portable power supply (e.g., a battery) and a wirelessconnection, for example, a cellular or a wireless area networkinterface, or wearable user devices, like smart watches and head-mounteddisplays. Examples of computing devices that, in some cases, are mobiledevices are described below with reference to FIG. 9. User devices 218,however, are not limited to handheld mobile devices, and may includedesktop computers, laptops, vehicle in-dash computing systems, livingroom set-top boxes, and public kiosks having computer interfaces. Insome cases, the user devices 18 number in the millions or hundreds ofmillions and are geographically distributed, for example, over an entirecountry or the planet.

Each user devices 218 may include a processor and memory storing anoperating system and various special-purpose applications, such as abrowser by which webpages and advertisements are presented, orspecial-purpose native applications, such as weather applications,games, social-networking applications, shopping applications, and thelike. In some cases, the user devices 218 include a location sensor,such as a global positioning system (GPS) sensor (or GLONASS, Galileo,or Compass sensor) or other components by which geographic location isobtained, for instance, based on the current wireless environment of themobile device, like SSIDs of nearby wireless base stations, oridentifiers of cellular towers in range. In some cases, the geographiclocations sensed by the user devices 218 may be reported to the contentserver 222 for selecting content based on location to be shown on themobile devices 218, and in some cases, location histories (e.g., asequence of timestamps and geographic location coordinates) are acquiredby the geographic-data providers 220, which may include contentproviders. In other cases, geographic locations are inferred by, forinstance, an IP address through which a given device 218 communicatesvia the Internet 216, which may be a less accurate measure thanGPS-determined locations. Or in some cases, geographic location isdetermined based on a cell tower to which a device 218 is wirelesslyconnected. Depending on how the geographic data is acquired andsubsequently processed, that data may have better or less reliablequality and accuracy.

In some use cases, the number of people in a particular geographic areaat a particular time as indicated by such location histories may be usedto update records in the geolocation analytics platform 212. Locationhistories may be acquired by batch, e.g., from application programinterfaces (APIs) of third-party providers, like cellular-networkoperators, advertising networks, or providers of mobile applications.Batch formatted location histories are often more readily available thanreal-time locations, while still being adequate for characterizinglonger term trends in geographic data. And some embodiments may acquiresome locations in real time (e.g., within 2 seconds of a request), forinstance, for selecting content (like an advertisement, review, article,or business listing) to be displayed based on the current location.

The user-data providers 220 may provide data about users that is notnecessarily tied to geolocation, such as purchasing history, mediaviewing history, automotive records, social networking activity, and thelike. In some cases, user-data providers 220 include credit cardprocessors, banks, cable companies, or television rating services. Insome embodiments, user-data providers include microblogging services,location check-in services, or various other social networks. In somecases, audience classification according to geolocation may besupplemented with such data, for instance, according to the appearanceof various keywords in social network posts, linkages between usersindicated by social networks, or patterns in buying or reviewingbehavior. In some cases, various features may be extracted from suchdata and included in the analysis described below for identifyingaudiences. In some cases, the techniques described in U.S. ProvisionalPatent Application 62/244,768, filed 22 Oct. 2015, titled DETECTINGINFLUENCERS IN SOCIAL NETWORKS WITH LOCATION DATA, may be executed bythe illustrated system to detect influencers and target content to them.

The illustrated content server 222 is operative to receive a request forcontent, select content (e.g., images and text), and send the contentfor display or other presentation to a user. One content server 222 isshown, but embodiments are consistent with substantially more, forexample, numbering in the thousands. In some cases, the content isadvertisements and advertisements are selected or bid upon with a priceselected based on the geographic location of a computing device uponwhich an advertisement will be shown, which may be indicated by one ofthe geographic-data providers/content servers, or such entities may alsobe a publisher selling the advertising inventory. Accordingly, theaccuracy and quality of such geographic data may be of relevance to theparties selling or buying such advertising space. The selection orpricing of advertisements may also depend on other factors. For example,advertisers may specify a certain bid amount based on the attributes ofthe geographic area documented in the geolocation analytics platform212, or the advertiser may apply various thresholds, requiring certainattributes before an advertisement served, to target advertisementsappropriately.

Some embodiments include a fraud detector 224 which may include anautomated process run by a financial institution that detects anomalousbehavior indicative of fraud based, in part, on correlations (or lackthereof) between financial transactions and patterns identified by thegeolocation analytics platform 212. For instance, in some embodiments,the fraud detector 224 may submit a query to the geolocation analyticsplatform 212 based on a financial transaction, such as the purchase of aparticular type of automobile, and the geolocation analytics platform212 may respond with an audience classification of the user. In someembodiments the fraud detector 224 may determine whether the user whoengaged in the financial transaction is likely to be a member of theaudience for such purchases based on the data provided by thegeolocation analytics platform 212. For example, a user who is not amember of an audience in Austin, Tex. that is present in Austin golfcourses regularly, upon purchasing a set of golf clubs, may trigger afraud alert, when the fraud detector receives a report for thegeolocation analytics platform 212 that the user is not a member of anAustin, Tex., golf-playing audience. In some cases, the fraud detectormay maintain an ontology of types of financial transactions andaudiences associated with those transactions. Upon receiving a record ofa financial transaction, the fraud detector may query audiencescorresponding to the user, the location, and the time of thetransaction, and determine whether the responsive audiences match thoseassociated with the type of financial transaction in the ontology. Fraudmay be detected based on the absence of such matches.

In some embodiments, the site selector 226 may categorize geographicareas as appropriate sites for various activities, such as positioningstores, allocating government resources, or distributing content intovarious zones based on geolocations frequented by audiences identifiedby the geolocation analytics platform 212. For instance, the siteselector 226 may submit a request for zones in which members of aparticular audience are present during lunch time and positionrestaurants in those zones.

In some embodiments, the geolocation analytics platform 212 may includea controller 228 that directs the activity of and routes data betweenthe various components of the geolocation analytics platform 212. Insome cases, the functionality of the controller may be divided intovarious processes, such as a separate controller for ingesting data,cleaning and normalizing data, classifying audiences and zones,targeting content, and evaluating the success of such targeting indriving visitation to various geographic locations. In some embodiments,activities other than programmatic content targeting may be performed asbatch processes at times scheduled by the controller 228, such as dailyor hourly, non-contemporaneously with when such data is used, tofacility faster responses when the pre-processed data is used.

Some embodiments may include an ingest module 230 operative to retrievedata from the geographic-data providers 214 and user-data providers 220via various APIs of such services. In some cases, such data may berouted by the controller 228 to a geographic data evaluator 262,examples of which are described in U.S. patent application Ser. No.14/553,422, which is incorporated by reference in its entirety. Thegeographic-data evaluator may evaluate the quality of geographic data bygeographic data provider and detect suspect, low-quality geographicdata. Data from such providers with a history of providing low-qualitydata may be rejected from, or down-weighted in, the analyses describedbelow, or such data providers may be stored with corresponding scoresfor purposes of bidding on the opportunity to serve advertisements orother content via such providers, for instance, in response to a contentrequest for a website hosted by such a geographic-data provider.

Some embodiments may include an application program interface server232, which may receive requests for information about audiences andgeographic locations from the various entities operating devices 222,224, and 226. In some cases, this may include requests by a third partycontent targeter for audiences corresponding to a current user device,at a current geolocation, requesting content at a current time (e.g.,within the previous two seconds or so). In some cases, responsive datamay include a list of audiences corresponding to these inputs or a listof scores for a plurality of audiences indicative of how well thoseinputs correspond to those audiences. In other examples, the request mayinclude a request for an inventory of geographic areas corresponding toa specified audience, such as geographic areas or categories of placesfrequented by mobile device users who also frequent a given store orcategory of stores.

Some embodiments may include a geographic-data repository 234. Thegeographic-data repository 234, in some embodiments, stores geographicdata from the geographic-data providers 214 and associated qualityprofiles of the geographic data, including measures of geographic dataquality and accuracy provided by the geographic-data evaluator 262. Insome embodiments, content providers, such as advertisers, or publishers,or others interested in the quality of geographic data from a given dataprovider 214 may query the geographic-data repository 234 forinformation output by the geographic-data evaluator 262.

Some embodiments may include a geographic information system 236. Thegeographic information system 236 may be configured to provideinformation about geographic locations in response to queries specifyinga location or attribute of interest (or combinations thereof). In someembodiments, the geographic information system (GIS) 236 organizesinformation about a geographic area by quantizing (or otherwisedividing) the geographic area into area units, called tiles, that aremapped to subsets of the geographic area. In some cases, the tilescorrespond to square units of area having sides that are between10-meters and 1000-meters, for example, approximately 100-meters perside, depending upon the desired granularity with which a geographicarea is to be described. Tiles are, however, not limited tosquare-shaped tiles, and may include other tilings, such as a hexagonaltiling, a triangular tiling, or other regular tilings (e.g., for simplerprocessing), semi-regular tilings, or irregular tilings (e.g., fordescribing higher density areas with higher resolution tiles, whileconserving memory with larger tiles representing less dense areas). Insome cases, such tilings may facilitate relatively fast access to data,such as in-memory data structures responsive to queries withoutretrieving data from a hard disk, though embodiments are not limited tosystems that provide this benefit, which is not to suggest that anyother feature described herein may also be omitted in some embodiments.

In some cases, polygons corresponding to businesses and other places,points corresponding to points of interest, and lines corresponding toroads, railroad tracks, and the like may also be stored in thegeographic information system 36 as geographic features. In some cases,attributes of tiles overlapping such features may be mapped to thesefeatures, e.g., in proportion to the amount of area of a tile occupiedby the corresponding feature and as a weighted combination of multipletiles in which such a feature may be disposed, for instance, with suchweights being proportional to the amount area of the feature in eachrespective tile. In some cases, the described attributes of the tilesmay be mapped directly to the features, e.g., with a record for eachsuch a feature, or subset of such a feature, like a floor of a store, oraisle of a store, with the features grouped according to the tile inwhich they are disposed for relatively fast searching of features byfirst retrieving a group of features in a single tile. To simplify themapping, in some cases, irregular tiles may correspond to the boundariesof features.

In some cases, the attributes of a geographic area change over time.Accordingly, some embodiments divide each tile (or feature, ifcharacterized separately, for example) according to time. For instance,some embodiments divide each tile into subsets of some duration of time,such as one week, one month, or one year, and attributes of the tile arerecorded for subsets of that period of time. For example, the period oftime may be one week, and each tile may be divided by portions of theweek selected in view of the way users generally organize their week,accounting, for instance, for differences between work days andweekends, work hours, after work hours, mealtimes, typical sleep hours,and the like. Examples of such time divisions may include a duration fora tile corresponding to Monday morning from 6 AM to 8 AM, during whichusers often eat breakfast and commute to work, 8 AM till 11 AM, duringwhich users often are at work, 11 AM till 1 PM, during which users areoften eating lunch, 1 PM till 5 PM, during which users are often engagedin work, 5 PM till 6 PM, during which users are often commuting home,and the like. Similar durations may be selected for weekend days, forexample 8 PM till midnight on Saturdays, during which users are oftenengaged in leisure activities. In some cases the divisions of time arelogically connected but are disjoint, for instance, morning and eveningcommute times may be classified in a single category of timecorresponding to commuting. Each of these durations may be profiled ateach tile.

In some embodiments, the geographic information system 236 includes aplurality of tile (or feature, if separately tracked) records, each suchrecord corresponding to a different subset of a geographic area. Eachtile (or feature) record may include an identifier, an indication ofgeographic area corresponding to the tile (which for regularly sizedtiles may be the identifier from which location can be calculated or maybe a polygon with latitude and longitude vertices, for instance), and aplurality of tile-time records. Each tile-time record may correspond toone of the above-mentioned divisions of time for a given tile, and thetile-time records may characterize attributes of the tile at differentpoints of time, such as during different times of the week. Eachtile-time (or feature-time) record may also include a density scoreindicative of the number of people in the tile at a given time. In someembodiments, each tile-time record includes an indication of theduration of time described by the record (e.g., lunch time on Sundays,or dinnertime on Wednesdays) and a plurality of attribute records, eachattribute record describing an attribute of the tile at thecorresponding window of time during some cycle (e.g., weekly). Someembodiments may include seasonal variants of such time designations,e.g., a set of time categories for the Christmas season, a set forSummer, and a set for the remainder of the year, constituting a type oftime-tile record called a time-tile-season record.

The attributes may be descriptions of activities in which users (e.g.,of third party services that provide data to the geolocation analyticsplatform 212) engage that are potentially of interest to advertisers orothers interested in geographic data about human activities andattributes (e.g., geodemographic data or geopsychographic data). Forexample, some advertisers may be interested in when and where users goto particular types of restaurants, when and where users play golf, whenand where users watch sports, when and where users fish, or when andwhere users work in particular categories of jobs. In some embodiments,each tile-time record may include a relatively large number of attributerecords, for example, more than 10, more than 100, more than 1000, orapproximately 4000 attribute records, depending upon the desiredspecificity with which the tiles are to be described. Each attributerecord may include an indicator of the attribute being characterized andan attribute score indicating the degree to which users tend to engagein activities corresponding to the attribute in the corresponding tileat the corresponding duration of time. In some cases, the attributescore (or tile-time record) is characterized by a density scoreindicating the number of users expected to engage in the correspondingactivity in the tile at the time. In some cases, attributes may beorganized in a hierarchical ontology, for instance,businesses→retail→convenience_stores, ordemographic→suburbanite→young_professional.

Thus, to use some embodiments of the geographic information system 236,a query may be submitted to determine what sort of activities usersengage in at a particular block in downtown New York during Fridayevenings, and the geographic information system 236 may respond with theattribute records corresponding to that block at that time. Thoseattribute records may indicate a relatively high attribute score forhigh-end dining, indicating that users typically go to restaurants inthis category at that time in this place, and a relatively low attributescore for playing golf, for example. Or a query may request tiles orfeatures for which a given attribute score is exhibited. Attributescores may be normalized, for example, a value from 0 to 10, with avalue indicating the propensity of users to exhibit behavior describedby that attribute. In some cases, scoring attributes according to adiscrete set of normalized values may facilitate use of in-memory datastructures that provide relatively fast access to information, thoughembodiments are not limited to systems that provide this benefit, whichis not to suggest that any other feature described herein may also beomitted in some embodiments. Further, the attribute scores may bepre-calculated before such scores are used in an analysis, as some formsof analysis are relatively latency sensitive, such as content selection,which users are expected prefer to have happen within less than 500milliseconds, while calculating attribute scores may take substantiallylonger.

In some cases, the user-profile repository 238 may store profiles ofusers of mobile devices 218 that are based on a user's geolocationhistory and in some cases data from user-data providers 220. In somecases, these user profiles may be created by a user profiler 256, anexample of which is described in U.S. Pat. No. 8,489,596, the entirecontents of which are incorporated by reference. The user profiler 256may join the location histories of user devices corresponding to a userand tile records implicated by locations in those location histories togenerate user profiles. Thus, users may be characterized according tothe attributes of the places those users visit at the time the uservisits those places. The generated user profiles may then be stored bythe user profiler 256 in the user-profile repository 238.

The illustrated user-profile repository 238 includes a plurality ofuser-profile records, each record corresponding to the profile of agiven user or a given mobile device 218, e.g., based on device mappingsdescribed above with profiles associated with one or moreinternal-namespace device identifiers. A user may have multipleprofiles, one per device, or a single profile, e.g., with multipledevices. Each user-profile record may include an identifier of therecord (which may be a value otherwise uncorrelated with the identity ofthe user to enhance privacy), and an identifier of the source or sourcesof the location histories from which the profile was created such thatsubsequent location histories can be matched with the profile (e.g. aaccount associated with a special-purpose native application, a cellphone number, or some other value, which may be hashed to enhance userprivacy).

Each user-profile record may also include a plurality of profile time(or profile-time-season) records indicating attributes of the userprofile at different times during some cycle of time (e.g., portions ofthe week or month, or during other periods like those described abovewith reference to the geographic information system 236). In some cases,the profile-time records may correspond to the same durations of time asthose of the time-tile records described above. Each profile-time recordmay include an indication of the duration of time being described (e.g.Thursdays at dinnertime, or Saturday midmorning) and a plurality ofprofile attribute records, each profile attribute record indicating thepropensity of the corresponding user to engage in an activity, orexhibit a property, described by the attribute during the correspondingtime of the profile-time record. The profile time records may allowtracking of when users tend to engage in a given activity (e.g., time ofday, day of week, week of year). In some embodiments, the profileattribute records correspond to the same set of attribute recordsdescribed above with reference to the geographic information system 236.Each profile-attribute record may include an indication of the attributebeing characterized (e.g., attending a children's soccer game, havingbrunch at a fast-casual dining establishment, parent running errands, orshopping at a mall) and a score indicating the propensity of the user toengage in the activity at the corresponding time, such as a normalizedvalue from 0 to 10. The attribute records may further include a samplesize, indicative of the number of samples upon which the attribute scoreis based, for weighting new samples, and a measure of variance amongthese samples (e.g., a standard deviation) for identifying outliers.

As described below, the user-profile records may be used for a varietyof purposes. For example, publishers operating content server 222 maysubmit to the geolocation analytics platform 212 a query identifying oneof the user-profile records, such as a hashed value of a user accountnumber or phone number, and the geolocation analytics platform 212 mayrespond with the attributes of the corresponding user at the currenttime. In some embodiments, to further enhance user privacy, queries maybe submitted for a specific attribute to determine whether to servecontent corresponding to the attribute, or a query may request a binaryindication of whether the attribute score is above a threshold.

In another example, the user-profile repository 238 may be used by theuser profiler 256 to augment the records in the geographic informationsystem 236. For example, an index may be created for each attribute thatidentifies tiles where users having relatively strong scores (e.g. abovea threshold) for the respective attribute tend to co-occur at giventimes. These indices may correspond to heat maps (though no visualrepresentation need be created) indicating where, for example, usersinterested in golf tend to be during various times of the day, such thatcontent-providers can select content based on this information, orrelated services may be positioned nearby. In some embodiments, an indexmay be created for each user attribute at each of the above-describeddivisions of time in the geographic information system 236, and theseindices may be queried to provide relatively prompt responses relatingto where users having a given attribute or combination of attributestend to co-occur at various times. Precalculating the indices isexpected to yield faster responses to such queries than generatingresponsive data at the time the query is received. For instance, usingexamples of these indices relating to fishing and employment in banking,an advertiser may determine that people who engage in fishing on theweekend and work in banking tend to drive relatively frequently along aparticular stretch of road on Mondays during the evening commute, andthat advertiser may purchase an advertisement for bass fishing boats asa source of relaxation for bankers on a billboard along that road inresponse.

In some cases, user profiles may be supplemented with data from theuser-data providers 220. In some embodiments, a user-data repository 240may store such data as it is acquired for further analysis. Further, insome embodiments, the quality of data from such data providers may bescored, and such scores may be associated with identifiers of theproviders in the user-data repository 240. In some embodiments, thisdata may be down-weighted or rejected based on indicators oflow-quality.

Some embodiments may include an audience repository 240 storing recordsby which audience membership may be determined. These records, in somecases may be created and accessed by an audience classifier 254. In somecases, audience membership is pre-calculated before a query is received,for example, for each recognize query within some parameter space, forinstance, for every type of attribute record, pair of attribute records,or attribute record combined with larger geolocation area, like weekendgolfers in the state of Texas. In some cases, parameters of models bywhich audience membership is determined may be stored in the audiencerepository 242, for example, learned parameters that are pre-calculatedaccording to training sets. In some cases, an audience membership vectormay be calculated based on a given geographic location, a given useridentifier (e.g., a device identifier), and given time, with eachcomponent of the vector indicating membership in a correspondingaudience. In some cases, membership may be binary, or some embodimentsmay score membership, for example from 0 to 10 depending on theprobability of membership in the corresponding audience given theinputs. In some cases, each component of the audience vector may becalculated according to an audience member function that is acombination (e.g., weighted sum) of feature functions. Examples of suchfeature functions may include scores indicating whether a given user iscurrently within a tile having a particular attribute score (orcollection of attribute scores) above a threshold, whether a given userhas visited tiles having a particular attribute score above a thresholdat particular times more than a threshold amount of times within sometrailing duration, and the like. In some cases, a collection of audiencevectors for each user may be stored in the respective user profile,e.g., as a sparse matrix having rows or columns indexed according totimes and geolocations at which the corresponding audience vectorapplies. In some cases, identifying feature functions with predictivevalue can be relatively challenging given the relatively large,high-dimensional search space of candidate feature functions in manycommercially relevant implementations.

Some embodiments may include a zone repository 244, which may includezone records populated by a zone classifier 252. Zones may be geographicareas associated with audiences. For example, some embodiments mayidentify geographic areas that students at a local university tend tovisit, with the corresponding audience being likely students at a givenuniversity or collection of universities, or those who are regularly atsuch universities (e.g., more than a threshold amount of times in atrailing duration of time). In some cases, the zone repository mayinclude zone records that list tiles or time tiles likely to be visitedby members of particular audiences. In some cases, zones may beclassified according to an amount of mutual information between ofevents corresponding to audience membership and members of thoseaudiences visiting particular tiles. In some cases, the mutualinformation may be calculated in terms of a conditional entropy, andtiles having the highest mutual information (for example, greater than athreshold amount of tiles, like a threshold percentage) may be selectedfor consideration as members of the zone for that audience.

In some cases, the selected candidate tiles may be clustered andresulting clusters may be designated as zones. Some embodiments mayexecute a density-based clustering algorithm, like DBSCAN, to establishgroups corresponding to the resulting clusters and exclude outliers.Some embodiments may examine each of the geolocations reflected in therecords and designate a tile as a core tile if at least a thresholdamount of the other tiles in the records are within a thresholdgeographic distance or number of tiles. Some embodiments may theniterate through each of the tiles and create a graph of reachablegeolocations, where nodes on the graph are identified in response tonon-core corresponding tiles being within a threshold distance of a coretile in the graph, and in response to core tiles in the graph beingreachable by other core tiles in the graph, where two tiles arereachable from one another if there is a path from one tile to the othertile where every link and the path is a core tile and the tiles in thelink are within a threshold distance of one another. The set of nodes ineach resulting graph, in some embodiments, may be designated as acluster, and points excluded from the graphs may be designated asoutliers that do not correspond to clusters. Outliers may be excludedfrom zones in some cases.

Some embodiments may include a visit-metrics repository 246 havingrecords created by a visitation rate module 248. In some cases, therecords may indicate the degree to which content targeted to particularusers succeeded in driving those users to visit a targeted geographiclocation, for example, records indicating whether an advertisementtargeted to users in a particular neighborhood succeeded in drivingthose users to visit a particular store. In some cases, the visitationrate module 248 may include the visitation rate module of U.S. patentapplication Ser. No. 13/769,736, the entire contents of which areincorporated by reference. In some cases, visitation rates may beadjusted to account for undercounting of undetected people, for example,those not employing cell phones while in the targeted location oremploying cell phones that are not detectable, for instance, due to lackof signal quality for a particular type of handset or carrier. In somecases, such undercounting may correlate with various attributes of theuser, including the user's mobile device, and some embodiments mayadjust detected visitation rates to account for such undercounting. Someembodiments may measure a marginal increase in an amount of visits to atarget geographic location likely to be attributable to targetedcontent. For example, some embodiments may identify audience members,serve targeted content to some of the audience members (e.g., atreatment group), and compare visitation amounts (e.g., calculate astatistically significant amount of difference between) between thoseaudience members that receive the targeted content and those that didnot (e.g., a control group of the audience) to determine a marginalincrease attributable to the targeted content. Feedback from suchmeasurements may be used to tune audience classification algorithms orselect among audiences, e.g., dynamically unselecting audiences forwhich a response fails to satisfy a visitation threshold. That said, notall embodiments necessarily provide these benefits, which is not tosuggest that any other feature may not also be omitted in some cases.

Some embodiments may include a programmatic content targeter 250. Insome cases, this module may automatically determine whether to providecontent and which content to provide, in some cases at the time of thecontent request, based on classification of audiences or zones. In someembodiments, the programmatic content targeter 250 may programmaticallydetermine audience membership and determine a bidding amount forsubmitting a bid to an online auction to provide an advertisement to agiven user. To facilitate relatively fast responses to such timesensitive requests, some embodiments may pre-calculate zoneclassifications and audience classifications and index thoseclassifications according to parameters of a content request (e.g.,according to key values based on (such as hash values of) one or more ofa device or user identifier, a geographic location, and a category oftime corresponding to the time tile records). In some cases, bidding maybe real-time, e.g., within less than 500 milliseconds of when an ad isrequested, and often even sooner. In other cases, advertising space maybe pre-purchased programmatically before ad requests, e.g., based onexpected audience behavior in the coming hours or days. In other cases,other types of content may be programmatically targeted, e.g., businesslistings or articles based on audience membership. Programmatictargeting based on audience classification is expected to reduce laborcosts relative to manual tuning and targeting of content. That said, notall embodiments necessarily provide these benefits, which is not tosuggest that any other feature may not also be omitted in some cases.

Some embodiments may include an anonymized-user-identifier matcher 258,examples of which are described above (e.g., with FIGS. 4-5corresponding to one set of examples and FIGS. 6-7 corresponding toanother set, which is not to imply the two approaches may not be used incombination, which is another set of contemplated examples) and in U.S.patent application Ser. No. 14/334,066, the entire contents of which areincorporated by reference. In some cases, a user may switch mobiledevices or be reassigned a device identifier. Re-creating a user profilefor that user based on the new identifier can be time-consuming andparticularly difficult at commercially-relevant scales. Accordingly,some embodiments of the matcher 258 may detect matches betweengeolocation patterns of a new user identifier and an old user identifierto assign that new identifier to an existing user profile when suchmatches are detected. This is expected to yield more accurateclassifications of audiences based on more complete data for individualsusing two different devices. That said, not all embodiments necessarilyprovide these benefits, which is not to suggest that any other featuremay not also be omitted in some cases.

Some embodiments may further include a geographic-data projector 260, anexample of which is described in U.S. patent application Ser. No.13/938,974, the entire contents of which are incorporated by reference.In some cases, geographic-data providers may provide data at arelatively low resolution, e.g., census data reported at the zip codelevel. Some embodiments may un-evenly project such values ontohigher-resolution geographic areas (e.g., some instances of the tilerecords or corresponding geographic features) within the low-resolutionarea based on a distribution of a population within that largergeographic area. Accordingly, some embodiments may enrich the records ofthe geographic information system 236 by which audiences and zones areidentified with information that would otherwise be inapplicable orinaccurately applied. That said, not all embodiments necessarily providethese benefits, which is not to suggest that any other feature may notalso be omitted in some cases.

The profiles may characterize a variety of attributes of users. In oneillustrative use case, a location history may indicate that a userfrequently visits geographic locations associated with tourism, and theprofile of that user may be updated to indicate that the user frequentlyengages in tourism, which may be of interest to certain categories ofadvertisers. Or a user may spend their working hours in geographic areasassociated with childcare and residences, and based on their locationhistory, the profile of that user may be updated to indicate that theuser likely engages in childcare for children younger than school age.Other examples are described below.

Further, as explained in detail below, the attributes associated withgeographic locations may vary over time (for example, an area withcoffee shops and bars may have a stronger association with consumptionof breakfast or coffee in the morning, an association which weakens inthe evening, while an association with entertainment or nightlife may beweaker in the morning and stronger in the evening). User profiles may begenerated in accordance with the time-based attributes that predominatewhen the user is in a geographic area. And in some embodiments, userprofiles may also be segmented in time, such that a portion of a givenuser's profile associated with a weekday morning may have differentattributes than another portion of that user's profile associated with aweekend night, for instance.

The user profiles may be used by advertisers and others in aprivacy-friendly fashion, such that users are expected to tend to opt into sharing their location history. For example, the user profiles may beaggregated to identify geographic areas having a high density of aparticular type of user at a particular time of the week, such as asports stadium having a relatively large number of users associated withfishing as a hobby, or a children's soccer field in which a relativelylarge number of people associated with golfing as a hobby might tend toco-occur on weekend mornings. Such correlations may be presented toadvertisers or others without disclosing information by which individualusers can be uniquely identified. In other applications, user-specificinformation may be provided, for example, users who opt in to sharingtheir profiles may receive user-specific services or communicationsformulated based on the individual profile of that user.

Accounting for time when characterizing geographic areas is believed toyield relatively accurate characterizations of places, as the activitiesthat people engage in at a given location tend to depend strongly ontime of day and week. And for similar reasons, accounting for time whenprofiling users is expected to yield relatively accuratecharacterizations of the users. Generating profiles based on locationhistory further offers the benefit of profiling users without imposingthe burden of manually doing so on the users themselves, and usingattributes of geographic areas in which the user travels is expected toyield relatively privacy-friendly data about the user. That said, notall embodiments offer all, or any, of these benefits, as variousengineering and cost trade-offs are envisioned, and other embodimentsmay offer other benefits, some of which are described below.

As noted above, the user profiler 212 obtains data from the mobiledevices 216 and the geographic information system 218 to output userprofiles to the user-profile datastore 214 for use by the ad servers 222or for other purposes. Accordingly, these components are described inthis sequence, starting with inputs, and concluding with outputs.

The mobile devices 216 maybe any of a variety of different types ofcomputing devices having an energy storage device (e.g., a battery) andbeing capable of communicating via a network, for example via a wirelessarea network or a cellular network connected to the Internet 220. Insome cases, the mobile devices 216 are handheld mobile computingdevices, such as smart phones, tablets, or the like, or the mobiledevices may be laptop computers or other special-purpose computingdevices, such as an automobile-based computer (e.g., an in-dashnavigation system). The mobile devices 216 may have a processor and atangible, non-transitory machine-readable memory storing instructionsthat provide the functionality described herein when executed by theprocessor. The memory may store instructions for an operating system,special-purpose applications (apps), and a web browser, depending uponthe use case. It should be noted, however, that the present techniquesare not limited to mobile devices, and other computing devices subjectto geolocation may also generate data useful for forming user profiles.For instance, set-top boxes, gaming consoles, or Internet-capabletelevisions may be geolocated based on IP address, and data from userinteractions with these devices may be used to update user profiles,e.g., with user interaction indicating a time at which a user was at thegeolocation corresponding to the device.

This software may have access to external or internal services by whichthe location of the mobile device may be obtained. For example, themobile device may have a built-in satellite-based geolocation device(for instance a global-positioning system, or GPS, device or componentsoperative to obtain location from other satellite-based systems, such asRussia's GLONASS system or the European Union's Galileo system). Inanother example, location may be obtained based on the current wirelessenvironment of the mobile device, for example by sensing attributes ofthe wireless environment (e.g. SSIDs of wireless hotspots, identifiersof cellular towers and signal strengths, identifiers of low energyBluetooth beacons, and the like) and sending those attributes to aremote server capable of identifying the location of the mobile device.In some embodiments, the location may be obtained based on an identifierof a network node through which the mobile device connects to theInternet, for example by geocoding an IP address of a wireless router orbased on a location of a cellular tower to which the mobile device isconnected. The location may be expressed as a latitude and longitudecoordinate or an area, and in some cases may include a confidence score,such as a radius or bounding box defining area within which the deviceis expected to be with more than some threshold confidence.

From time to time, the location of the mobile devices 216 may beobtained by the mobile devices. For example, when a user interacts witha special-purpose application, in some cases, the application may havepermission to obtain the location of the mobile device and report thatlocation to a third party server associated with the application, suchthat the location may be obtained by the user profiler 212 from thethird party server. In another example, the user may visit a websitehaving code that obtains the current location of the mobile device. Thislocation may be reported back to the server from which the website wasobtained or some other third party server, such as an ad server for anaffiliate network, and location histories may be obtained from thisserver. In another example, locations of the mobile devices 216 may beobtained without the participation of the mobile device beyondconnecting to a network. For instance, users may opt in to allowing acellular service provider to detect their location based on cellularsignals and provide that location to the user profiler 212. Dependingupon how location is obtained, the location may be acquiredintermittently, for example at three different times during a day when auser launches a particular application, or relatively frequently, forexample by periodically polling a GPS device and reporting the location.In some cases, the location history may include locations obtained morethan one-second apart, more than one-minute apart, more than one-hourapart, or more, depending upon the use case.

Locations may be obtained in real time from mobile devices 216 by theuser profiler 212, or in some embodiments, location histories may beobtained, e.g., from third party data providers. Each location historymay include records of geographic locations of a given mobile device andwhen the mobile device was at each location. In some cases, a locationhistory may include records of location over a relatively long durationof time, such as more than over a preceding hour, day, week, or month,as some modes of acquiring location histories report or update locationhistories relatively infrequently. A location history for a given mobiledevice may include a plurality (e.g., more than 10 or more than 100)location records, each location record corresponding to a detectedlocation of the mobile device, and each location record including ageographic location and the time at which the mobile device was at thelocation. The location records may also include a confidence scoreindicative of the accuracy of the detected location. Geographiclocations may be expressed in a variety of formats with varying degreesof specificity, for example as a latitude and longitude coordinates, astiles in a grid with which a geographic area is segmented (e.g.,quantized), or in some other format for uniquely specifying places.

The geographic information system 218 may be configured to provideinformation about geographic locations in response to queries specifyinga location of interest. In some embodiments, the geographic informationsystem 218 organizes information about a geographic area by quantizing(or otherwise dividing) the geographic area into area units, calledtiles, that are mapped to subsets of the geographic area. In some cases,the tiles correspond to square units of area having sides that arebetween 10-meters and 1000-meters, for example approximately 100-metersper side, depending upon the desired granularity with which a geographicarea is to be described. In other examples, the tiles have other shapes,e.g., hexagon shapes that are arranged in a two-dimensional hexagonalpacking layout.

In some cases, the attributes of a geographic area change over time.Accordingly, some embodiments divide each tile according to time. Forinstance, some embodiments divide each tile into subsets of some periodof time, such as one week, one month, or one year, and attributes of thetile are recorded for subsets of that period of time. For example, theperiod of time may be one week, and each tile may be divided by portionsof the week selected in view of the way users generally organize theirweek, accounting, for instance, for differences between work days andweekends, work hours, after work hours, mealtimes, typical sleep hours,and the like. Examples of such time divisions may include a duration fora tile corresponding to Monday morning from 6 AM to 8 AM, during whichusers often eat breakfast and commute to work, 8 AM till 11 AM, duringwhich users often are at work, 11 AM till 1 PM, during which users areoften eating lunch, 1 PM till 5 PM, during which users are often engagedin work, 5 PM till 6 PM, during which users are often commuting home,and the like. Similar durations may be selected for weekend days, forexample 8 PM till midnight on Saturdays, during which users are oftenengaged in leisure activities. Each of these durations may be profiledat each tile.

In some embodiments, the geographic information system 218 includes aplurality of tile records, each tile record corresponding to a differentsubset of a geographic area. Each tile record may include an identifier,an indication of geographic area corresponding to the tile (which forregularly size tiles may be the identifier), and a plurality oftile-time records. Each tile-time record may correspond to one of theabove-mentioned divisions of time for a given tile, and the tile-timerecords may characterize attributes of the tile at different points oftime, such as during different times of the week. Each tile-time recordmay also include a density score indicative of the number of people inthe tile at a given time. In some embodiments, each tile-time recordincludes an indication of the duration of time described by the record(e.g. lunch time on Sundays, or dinnertime on Wednesdays) and aplurality of attribute records, each attribute record describing anattribute of the tile at the corresponding window of time during somecycle (e.g., weekly).

The attributes may be descriptions of activities in which users engagethat are potentially of interest to consumers of the user-profiledatastore 214. For example, some advertisers may be interested in whenand where users go to particular types of restaurants, when and whereusers play golf, when and where users watch sports, when and where usersfish, or when and where users work in particular categories of jobs. Insome embodiments, each tile-time record may include a relatively largenumber of attribute records, for example more than 10, more than 100,more than 1000, or approximately 4000 attribute records, depending uponthe desired specificity with which the tiles are to be described. Eachattribute record may include an indicator of the attribute beingcharacterized and an attribute score indicating the degree to whichusers tend to engage in activities corresponding to the attribute in thecorresponding tile at the corresponding duration of time. In some cases,the attribute score (or tile-time record) is characterized by a densityscore indicating the number of users expected to engage in thecorresponding activity in the tile at the time.

Thus, to use some embodiments of the geographic information system 218,a query may be submitted to determine what sort of activities usersengage in at a particular block in downtown New York during Fridayevenings, and the geographic information system 218 may respond with theattribute records corresponding to that block at that time. Thoseattribute records may indicate a relatively high attribute score forhigh-end dining, indicating that users typically go to restaurants inthis category at that time in this place, and a relatively low attributescore for playing golf, for example. Attribute scores may be normalized,for example a value from 0 to 10, with a value indicating the propensityof users to exhibit behavior described by that attribute.

The user profiler 212 may join the location histories and tile recordsimplicated by locations in those location histories to generate userprofiles. Thus, users may be characterized according to the attributesof the places those users visit at the time the user visits thoseplaces. The generated user profiles may then be stored by the userprofiler 212 in the user-profile datastore 214, as described below. Tothis end, or others, some embodiments of the user profiler 212 includesa location-history acquisition module 224, a location-attributeacquisition module 226, and a user-attribute updater 228 operative togenerate user profiles.

The user profiler 212 may be constructed from one or more of thecomputers described below with reference to FIG. 9. These computers mayinclude a tangible, non-transitory, machine-readable medium, such asvarious forms of memory storing instructions that when executed by oneor more processors of these computers (or some other data processingapparatus) cause the computers to provide the functionality of the userprofiler 212 described herein. The components of the user profiler 212are illustrated as discrete functional blocks, but it should be notedthat the hardware and software by which these functional blocks areimplemented may be differently organized, for example, code or hardwarefor providing the this functionality may be intermingled, subdivided,conjoined, or otherwise differently arranged.

The illustrated location-history acquisition module 224 may beconfigured to acquire location histories of mobile devices 216 via theInternet 220. The location histories may be acquired directly from themobile devices 216, or the location histories may be acquired fromvarious third parties, such as a third-party hosting Web applicationsrendered on the mobile devices 216, third parties hosting servers towhich location histories are communicated by apps on the mobile devices216, or third parties providing network access to the mobile devices216, such as cellular service providers, for example. Thelocation-history acquisition module 224 may include a plurality ofsub-modules for obtaining location histories from a plurality ofdifferent providers. These sub-modules may be configured to request,download, and parse location histories from a respective one of thedifferent providers via application program interfaces provided by thoseproviders. The sub-modules may normalize the location histories from thedifferent providers, which may be in different formats, into a commonformat for use in subsequent processing. Location histories may beacquired periodically, for example monthly, weekly, or hourly, or morefrequently.

The user profiler 212 of this embodiment further includes thelocation-attribute acquisition module 226. The module 226 may beconfigured to obtain attributes of locations identified based on thelocation histories acquired by the location history acquisition module224. For example, the module 226 may be configured to iterate througheach location identified by each location history and query thegeographic information system 18 for attributes of those locations atthe time at which the user was at the corresponding location. In somecases, the location-attribute acquisition module 226 may also requestattributes of adjacent locations, such as adjacent tiles, from thegeographic information system 218 so that the user-attribute updater 228can determine whether a signal from a given tile is consistent with thatof surrounding tiles for assessing the reliability of variousindications.

The acquired location histories and location attributes may be providedby modules 224 and 226 to the user-attribute updater 228, which in someembodiments, is configured to generate user profiles based on this data.In some cases, the user-attribute updater 228 is operative to attachattributes of places visited by users to the profile of those users.These profiles may be stored by the user attribute updater 228 in theuser-profile datastore 214.

The user profile datastore 214 may be operative to store user profilesand, in some embodiments, address queries for data in the user profiles.The illustrated user-profile datastore 214 includes a plurality ofuser-profile records, each record corresponding to the profile of agiven user or a given mobile device 216. Each user-profile record mayinclude an identifier of the record (which may be a value otherwiseuncorrelated with the identity of the user to enhance privacy), and anidentifier of the source or sources of the location histories from whichthe profile was created such that subsequent location histories can bematched with the profile (e.g. a account associated with aspecial-purpose application, a cell phone number, or some other value,which may be hashed to enhance user privacy).

Each user-profile record may also include a plurality of profile timerecords indicating attributes of the user profile at different timesduring some cycle of time (e.g., portions of the week or month, orduring other periods like those described above with reference to thegeographic information system 218). In some cases, the profile-timerecords may correspond to the same durations of time as those of thetime-tile records described above. Each profile-time record may includean indication of the duration of time being described (e.g. Thursday'sat dinnertime, or Saturday midmorning) and a plurality of profileattribute records, each profile attribute record indicating thepropensity of the corresponding user to engage in an activity describedby the attribute during the corresponding time of the profile-timerecord. The profile time records may allow tracking of when users tendto engage in a given activity (time of day, day of week, week of year).In some embodiments, the profile attribute records correspond to thesame set of attribute records described above with reference to thegeographic information system 218. Each profile-attribute record mayinclude an indication of the attribute being characterized (e.g.,attending a children's soccer game, having brunch at a fast-casualdining establishment, parent running errands, or shopping at a mall) anda score indicating the propensity of the user to engage in the activityat the corresponding time, such as a normalized value from 0 to 10. Theattribute records may further include a sample size, indicative of thenumber of samples upon which the attribute score is based, for weightingnew samples, and a measure of variance among these samples (e.g., astandard deviation) for identifying outliers.

As described below, the user-profile records may be used for a varietyof purposes. For example, advertisers operating ad servers 222 maysubmit to the user-profile datastore 214 a query identifying one of theuser-profile records, such as the above-mentioned hashed value of a useraccount number or phone number, and the user-profile datastore 214 mayrespond with the attributes of the corresponding user at the currenttime. In some embodiments, to further enhance user privacy, queries maybe submitted for a specific attribute to determine whether to serve anadvertisement corresponding to the attribute, or a query may request abinary indication of whether the attribute score is above a threshold.

In another example, the user-profile datastore 214 may be used by theuser profiler 212 to augment the records in the geographic informationsystem 218. For example, an index may be created for each attribute thatidentifies tiles where users having relatively strong scores (e.g. abovea threshold) for the respective attribute tend to co-occur at giventimes. These indices may correspond to heat maps (though no visualrepresentation need be created) indicating where, for example, usersinterested in golf, tend to be during various times of the day, suchthat advertisers can select advertisements based on this information. Insome embodiments, an index may be created for each user attribute ateach of the above-described divisions of time in the geographicinformation system 218, and these indices may be queried to providerelatively prompt responses relating to where users having a givenattribute or combination of attributes tend to co-occur at varioustimes. Precalculating the indices is expected to yield faster responsesto such queries than generating responsive data at the time the query isreceived. For instance, using examples of these indices relating tofishing and employment in banking, an advertiser may determine thatpeople who engage in fishing on the weekend and work in banking tend todrive relatively frequently along a particular stretch of road onMondays during the evening commute, and that advertiser may purchase anadvertisement for bass fishing boats on a billboard along that road inresponse. Other examples relating to customization of software andservices and other forms of analysis are described in greater detailbelow.

In short, some embodiments of the computing environment 210 generateuser profiles that are relatively privacy-friendly to users and consumerelatively little effort on the part of users or others to create theprofiles. These advantages are expected to yield a relativelycomprehensive set of relatively high-resolution user profiles that maybe used by advertisers and others seeking to provide information andservices customized to the unique attributes of each user, facilitatingthe presentation of high-value and high-relevance advertisements andservices to users while respecting the users' interest in privacy. Thatsaid, not all embodiments provide these benefits, and some embodimentsmay forgo some or all of these embodiments in the interest of variousengineering trade-offs relating to time, cost, and features.

FIG. 9 is a diagram that illustrates an exemplary computing system 1000in accordance with embodiments of the present technique. Variousportions of systems and methods described herein, may include or beexecuted on one or more computer systems similar to computing system1000. Further, processes and modules described herein may be executed byone or more processing systems similar to that of computing system 1000.

Computing system 1000 may include one or more processors (e.g.,processors 1010 a-1010 n) coupled to system memory 1020, an input/outputI/O device interface 1030, and a network interface 1040 via aninput/output (I/O) interface 1050. A processor may include a singleprocessor or a plurality of processors (e.g., distributed processors). Aprocessor may be any suitable processor capable of executing orotherwise performing instructions. A processor may include a centralprocessing unit (CPU) that carries out program instructions to performthe arithmetical, logical, and input/output operations of computingsystem 1000. A processor may execute code (e.g., processor firmware, aprotocol stack, a database management system, an operating system, or acombination thereof) that creates an execution environment for programinstructions. A processor may include a programmable processor. Aprocessor may include general or special purpose microprocessors. Aprocessor may receive instructions and data from a memory (e.g., systemmemory 1020). Computing system 1000 may be a uni-processor systemincluding one processor (e.g., processor 1010 a), or a multi-processorsystem including any number of suitable processors (e.g., 1010 a-1010n). Multiple processors may be employed to provide for parallel orsequential execution of one or more portions of the techniques describedherein. Processes, such as logic flows, described herein may beperformed by one or more programmable processors executing one or morecomputer programs to perform functions by operating on input data andgenerating corresponding output. Processes described herein may beperformed by, and apparatus can also be implemented as, special purposelogic circuitry, e.g., an FPGA (field programmable gate array) or anASIC (application specific integrated circuit). Computing system 1000may include a plurality of computing devices (e.g., distributed computersystems) to implement various processing functions.

I/O device interface 1030 may provide an interface for connection of oneor more I/O devices 1060 to computer system 1000. I/O devices mayinclude devices that receive input (e.g., from a user) or outputinformation (e.g., to a user). I/O devices 1060 may include, forexample, graphical user interface presented on displays (e.g., a cathoderay tube (CRT) or liquid crystal display (LCD) monitor), pointingdevices (e.g., a computer mouse or trackball), keyboards, keypads,touchpads, scanning devices, voice recognition devices, gesturerecognition devices, printers, audio speakers, microphones, cameras, orthe like. I/O devices 1060 may be connected to computer system 1000through a wired or wireless connection. I/O devices 1060 may beconnected to computer system 1000 from a remote location. I/O devices1060 located on remote computer system, for example, may be connected tocomputer system 1000 via a network and network interface 1040.

Network interface 1040 may include a network adapter that provides forconnection of computer system 1000 to a network. Network interface may1040 may facilitate data exchange between computer system 1000 and otherdevices connected to the network. Network interface 1040 may supportwired or wireless communication. The network may include an electroniccommunication network, such as the Internet, a local area network (LAN),a wide area network (WAN), a cellular communications network, or thelike.

System memory 1020 may be configured to store program instructions 1100or data 1110. Program instructions 1100 may be executable by a processor(e.g., one or more of processors 1010 a-1010 n) to implement one or moreembodiments of the present techniques. Instructions 1100 may includemodules of computer program instructions for implementing one or moretechniques described herein with regard to various processing modules.Program instructions may include a computer program (which in certainforms is known as a program, software, software application, script, orcode). A computer program may be written in a programming language,including compiled or interpreted languages, or declarative orprocedural languages. A computer program may include a unit suitable foruse in a computing environment, including as a stand-alone program, amodule, a component, or a subroutine. A computer program may or may notcorrespond to a file in a file system. A program may be stored in aportion of a file that holds other programs or data (e.g., one or morescripts stored in a markup language document), in a single filededicated to the program in question, or in multiple coordinated files(e.g., files that store one or more modules, sub programs, or portionsof code). A computer program may be deployed to be executed on one ormore computer processors located locally at one site or distributedacross multiple remote sites and interconnected by a communicationnetwork.

System memory 1020 may include a tangible program carrier having programinstructions stored thereon. A tangible program carrier may include anon-transitory computer readable storage medium. A non-transitorycomputer readable storage medium may include a machine readable storagedevice, a machine readable storage substrate, a memory device, or anycombination thereof. Non-transitory computer readable storage medium mayinclude non-volatile memory (e.g., flash memory, ROM, PROM, EPROM,EEPROM memory), volatile memory (e.g., random access memory (RAM),static random access memory (SRAM), synchronous dynamic RAM (SDRAM)),bulk storage memory (e.g., CD-ROM and/or DVD-ROM, hard-drives), or thelike. System memory 1020 may include a non-transitory computer readablestorage medium that may have program instructions stored thereon thatare executable by a computer processor (e.g., one or more of processors1010 a-1010 n) to cause the subject matter and the functional operationsdescribed herein. A memory (e.g., system memory 1020) may include asingle memory device and/or a plurality of memory devices (e.g.,distributed memory devices). Instructions or other program code toprovide the functionality described herein may be stored on a tangible,non-transitory computer readable media. In some cases, the entire set ofinstructions may be stored concurrently on the media, or in some cases,different parts of the instructions may be stored on the same media atdifferent times, e.g., a copy may be created by writing program code toa first-in-first-out buffer in a network interface, where some of theinstructions are pushed out of the buffer before other portions of theinstructions are written to the buffer, with all of the instructionsresiding in memory on the buffer, just not all at the same time.

I/O interface 1050 may be configured to coordinate I/O traffic betweenprocessors 1010 a-1010 n, system memory 1020, network interface 1040,I/O devices 1060, and/or other peripheral devices. I/O interface 1050may perform protocol, timing, or other data transformations to convertdata signals from one component (e.g., system memory 1020) into a formatsuitable for use by another component (e.g., processors 1010 a-1010 n).I/O interface 1050 may include support for devices attached throughvarious types of peripheral buses, such as a variant of the PeripheralComponent Interconnect (PCI) bus standard or the Universal Serial Bus(USB) standard.

Embodiments of the techniques described herein may be implemented usinga single instance of computer system 1000 or multiple computer systems1000 configured to host different portions or instances of embodiments.Multiple computer systems 1000 may provide for parallel or sequentialprocessing/execution of one or more portions of the techniques describedherein.

Those skilled in the art will appreciate that computer system 1000 ismerely illustrative and is not intended to limit the scope of thetechniques described herein. Computer system 1000 may include anycombination of devices or software that may perform or otherwise providefor the performance of the techniques described herein. For example,computer system 1000 may include or be a combination of acloud-computing system, a data center, a server rack, a server, avirtual server, a desktop computer, a laptop computer, a tabletcomputer, a server device, a client device, a mobile telephone, apersonal digital assistant (PDA), a mobile audio or video player, a gameconsole, a vehicle-mounted computer, or a Global Positioning System(GPS), or the like. Computer system 1000 may also be connected to otherdevices that are not illustrated, or may operate as a stand-alonesystem. In addition, the functionality provided by the illustratedcomponents may in some embodiments be combined in fewer components ordistributed in additional components. Similarly, in some embodiments,the functionality of some of the illustrated components may not beprovided or other additional functionality may be available.

Those skilled in the art will also appreciate that while various itemsare illustrated as being stored in memory or on storage while beingused, these items or portions of them may be transferred between memoryand other storage devices for purposes of memory management and dataintegrity. Alternatively, in other embodiments some or all of thesoftware components may execute in memory on another device andcommunicate with the illustrated computer system via inter-computercommunication. Some or all of the system components or data structuresmay also be stored (e.g., as instructions or structured data) on acomputer-accessible medium or a portable article to be read by anappropriate drive, various examples of which are described above. Insome embodiments, instructions stored on a computer-accessible mediumseparate from computer system 1000 may be transmitted to computer system1000 via transmission media or signals such as electrical,electromagnetic, or digital signals, conveyed via a communication mediumsuch as a network or a wireless link. Various embodiments may furtherinclude receiving, sending, or storing instructions or data implementedin accordance with the foregoing description upon a computer-accessiblemedium. Accordingly, the present invention may be practiced with othercomputer system configurations.

In block diagrams, illustrated components are depicted as discretefunctional blocks, but embodiments are not limited to systems in whichthe functionality described herein is organized as illustrated. Thefunctionality provided by each of the components may be provided bysoftware or hardware modules that are differently organized than ispresently depicted, for example such software or hardware may beintermingled, conjoined, replicated, broken up, distributed (e.g. withina data center or geographically), or otherwise differently organized.The functionality described herein may be provided by one or moreprocessors of one or more computers executing code stored on a tangible,non-transitory, machine readable medium. In some cases, third partycontent delivery networks may host some or all of the informationconveyed over networks, in which case, to the extent information (e.g.,content) is said to be supplied or otherwise provided, the informationmay provided by sending instructions to retrieve that information from acontent delivery network.

The reader should appreciate that the present application describesseveral inventions. Rather than separating those inventions intomultiple isolated patent applications, applicants have grouped theseinventions into a single document because their related subject matterlends itself to economies in the application process. But the distinctadvantages and aspects of such inventions should not be conflated. Insome cases, embodiments address all of the deficiencies noted herein,but it should be understood that the inventions are independentlyuseful, and some embodiments address only a subset of such problems oroffer other, unmentioned benefits that will be apparent to those ofskill in the art reviewing the present disclosure. Due to costsconstraints, some inventions disclosed herein may not be presentlyclaimed and may be claimed in later filings, such as continuationapplications or by amending the present claims. Similarly, due to spaceconstraints, neither the Abstract nor the Summary of the Inventionsections of the present document should be taken as containing acomprehensive listing of all such inventions or all aspects of suchinventions.

It should be understood that the description and the drawings are notintended to limit the invention to the particular form disclosed, but tothe contrary, the intention is to cover all modifications, equivalents,and alternatives falling within the spirit and scope of the presentinvention as defined by the appended claims. Further modifications andalternative embodiments of various aspects of the invention will beapparent to those skilled in the art in view of this description.Accordingly, this description and the drawings are to be construed asillustrative only and are for the purpose of teaching those skilled inthe art the general manner of carrying out the invention. It is to beunderstood that the forms of the invention shown and described hereinare to be taken as examples of embodiments. Elements and materials maybe substituted for those illustrated and described herein, parts andprocesses may be reversed or omitted, and certain features of theinvention may be utilized independently, all as would be apparent to oneskilled in the art after having the benefit of this description of theinvention. Changes may be made in the elements described herein withoutdeparting from the spirit and scope of the invention as described in thefollowing claims. Headings used herein are for organizational purposesonly and are not meant to be used to limit the scope of the description.

As used throughout this application, the word “may” is used in apermissive sense (i.e., meaning having the potential to), rather thanthe mandatory sense (i.e., meaning must). The words “include”,“including”, and “includes” and the like mean including, but not limitedto. As used throughout this application, the singular forms “a,” “an,”and “the” include plural referents unless the content explicitlyindicates otherwise. Thus, for example, reference to “an element” or “aelement” includes a combination of two or more elements, notwithstandinguse of other terms and phrases for one or more elements, such as “one ormore.” The term “or” is, unless indicated otherwise, non-exclusive,i.e., encompassing both “and” and “or.” Terms describing conditionalrelationships, e.g., “in response to X, Y,” “upon X, Y,”, “if X, Y,”“when X, Y,” and the like, encompass causal relationships in which theantecedent is a necessary causal condition, the antecedent is asufficient causal condition, or the antecedent is a contributory causalcondition of the consequent, e.g., “state X occurs upon condition Yobtaining” is generic to “X occurs solely upon Y” and “X occurs upon Yand Z.” Such conditional relationships are not limited to consequencesthat instantly follow the antecedent obtaining, as some consequences maybe delayed, and in conditional statements, antecedents are connected totheir consequents, e.g., the antecedent is relevant to the likelihood ofthe consequent occurring. Statements in which a plurality of attributesor functions are mapped to a plurality of objects (e.g., one or moreprocessors performing steps A, B, C, and D) encompasses both all suchattributes or functions being mapped to all such objects and subsets ofthe attributes or functions being mapped to subsets of the attributes orfunctions (e.g., both all processors each performing steps A-D, and acase in which processor 1 performs step A, processor 2 performs step Band part of step C, and processor 3 performs part of step C and step D),unless otherwise indicated. Further, unless otherwise indicated,statements that one value or action is “based on” another condition orvalue encompass both instances in which the condition or value is thesole factor and instances in which the condition or value is one factoramong a plurality of factors. Unless otherwise indicated, statementsthat “each” instance of some collection have some property should not beread to exclude cases where some otherwise identical or similar membersof a larger collection do not have the property, i.e., each does notnecessarily mean each and every. Limitations as to sequence of recitedsteps should not be read into the claims unless explicitly specified,e.g., with explicit language like “after performing X, performing Y,” incontrast to statements that might be improperly argued to imply sequencelimitations, like “performing X on items, performing Y on the X'editems,” used for purposes of making claims more readable rather thanspecifying sequence. Unless specifically stated otherwise, as apparentfrom the discussion, it is appreciated that throughout thisspecification discussions utilizing terms such as “processing,”“computing,” “calculating,” “determining” or the like refer to actionsor processes of a specific apparatus, such as a special purpose computeror a similar special purpose electronic processing/computing device.

In this patent, certain U.S. patents, U.S. patent applications, or othermaterials (e.g., articles) have been incorporated by reference. The textof such U.S. patents, U.S. patent applications, and other materials is,however, only incorporated by reference to the extent that no conflictexists between such material and the statements and drawings set forthherein. In the event of such conflict, the text of the present documentgoverns.

The present techniques will be better understood with reference to thefollowing enumerated embodiments:

1. A method of joining data from feeds from multiple sources ofcomputing device network activity data having heterogenous deviceidentifier namespaces and device identifier to device mappings thatchange over time, the method comprising: accessing, with one or moreprocessors, three or more sources of network activity log data fromthree or more different sources of network activity data, wherein: eachsource of network activity log data describes network activity by morethan 100,000 mobile computing devices, each source of network activitylog data describes activities over a duration of time longer than onehour, each source of network activity log data provides transactionrecords of more than one 1 million transactions by at least some of themobile computing devices, each transaction record including one or moreexternal-namespace device identifiers in an external namespace of arespective mobile computing device participating in the respectivenetwork transaction, and the transaction records associate geolocationsreported by the mobile computing devices with timestamps andexternal-namespace device identifiers of the mobile computing devices;for each of the sources of network activity log data, based therespective network activity log data, updating, with one or moreprocessors, a multi-namespace mapping that maps the external-namespacedevice identifiers to internal-namespace device identifiers in aninternal namespace of a system configured to profile mobile computingdevices based on logged network activity data of the mobile computingdevices, wherein: the namespace mapping comprises a plurality ofexternal-namespace-specific mappings each mapping a respective type ofdevice identifier in a respective external namespace used in the networkactivity log data to one or more internal-namespace device identifiers,and at least some of the external-namespace device identifiers aremapped in at least some of the external-namespace-specific mappings to aplurality of internal-namespace device identifiers, with a given deviceexternal-namespace device identifier being mapped to a given pluralityof internal-namespace device identifiers; after updating themulti-namespace mapping, receiving, with one or more processors, anexternal-namespace device identifier; selecting, with one or moreprocessors, one of the external-namespace-specific mappings based on theexternal namespace of the received external-namespace device identifier;accessing, with one or more processors, a plurality ofinternal-namespace device identifiers mapped to the receivedexternal-namespace device identifier by the selectedexternal-namespace-specific mapping; and accessing, with one or moreprocessors, a device profile associated with at least some of theplurality internal-namespace device identifiers.2. The method of embodiment 1, wherein updating the multi-namespacemapping comprises, for the given external-namespace device identifier:determining to add a new internal-namespace device identifier mapping tothe given external-namespace device identifier; and adding the newinternal-namespace device identifier as a new branch to a graph of thegiven plurality of internal-namespace device identifiers.3. The method of embodiment 2, wherein the graph of the given pluralityof device identifiers in the internal namespace comprises: three or moreinternal-namespace device identifiers; and edges indicating that some ofthe three or more internal-namespace device identifiers are newerversions of others of the three or more internal-namespace deviceidentifiers.4. The method of embodiment 2, wherein the graph is an acrylic graph andwherein nodes of the graph are associated with scores indicative of alikelihood that the corresponding internal-namespace device identifieris correctly assigned to the given external-namespace device identifier.5. The method of any of embodiments 1-4, wherein each plurality ofinternal-namespace device identifiers mapped to a singleexternal-namespace device identifier are associated with links betweenrespective pairs of the internal-namespace device identifiers indicatingrelationships between the plurality of internal-namespace deviceidentifiers.6. The method of any of embodiments 1-5, wherein at least one of theexternal-namespace-specific mappings includes an associative arraycomprising key-value pairs, wherein: keys of the key-value pairs areoutputs of a hash function upon taking as an input an external-namespaceidentifier; the keys of the key-value pairs are values in a sequentialindex of the associative array; and values of the key-value pairs areinternal-namespace device identifiers mapped to the correspondingexternal-namespace identifier that yields a hash function output of thecorresponding key.7. The method of any of embodiments 1-6, wherein at least one of theexternal-namespace-specific mappings is a tree in which branchesrepresent different portions of the corresponding external namespace andat least some nodes are mapped to internal-namespace device identifierscorresponding to the portions of the corresponding external namespace.8. The method of any of embodiments 1-7, wherein: at least someinternal-namespace device identifiers appear in a plurality of differentexternal-namespace-specific mappings; and at least some of theexternal-namespace-specific mappings correspond to different mobileoperating systems from one another.9. The method of any of embodiments 1-8, wherein: updating themulti-namespace mapping comprises updating a subset of themulti-namespace mapping without re-calculating a different subset of themulti-namespace mapping.10. The method of any of embodiments 1-9, wherein: internal-namespacedevice identifiers mapped to external-namespace device identifierschange over time; and wherein both mappings from the external-namespacedevice identifiers to older and newer internal-namespace identifiers aremaintained in the multi-namespace mapping after the changes.11. The method of any of embodiments 1-10, wherein the externalnamespaces comprise: identifiers for advertising assigned by a firstoperating system; and advertising identifiers assigned by a secondoperating system different from the first operating system.12. The method of any of embodiments 1-11, wherein: the externalnamespaces include more than 6 different external namespaces, each ofthe different external namespaces including identifiers for devices thatare represented in at least one of the other eternal namespace.13. The method of any of embodiments 1-12, comprising: determining thatthe given plurality of internal-namespace device identifiers comprisesmore than a threshold amount of the device identifiers and, in response,omitting a corresponding mapping from responses to access request forthe given external-namespace device identifier.14. The method of any of embodiments 1-13, comprising: determining thatanother given internal-namespace device identifier is mapped in a givenpair of external namespaces; determining that the given pair of externalnamespaces correspond to different operating systems and, in response,determining to not use at least one of the corresponding mappings.15. The method of any of embodiments 1-14, wherein updating themulti-namespace mapping comprises steps for updating a multi-namespacemapping.16. The method of any of embodiments 1-15, comprising: profiling one ofthe mobile computing devices based on geolocations associated in thetransaction records with different external-namespace device identifiersand in the multi-namespace mapping with at least one of the sameinternal-namespace device identifiers; and storing a resulting profilein memory in association with one or more internal-namespace deviceidentifiers of the one of the mobile computing devices.17. The method of embodiment 16, comprising: sending a computing systemconfigured to select content for delivery to mobile computing devicesdata indicative of the profile and one or more external-namespace deviceidentifiers associated with the one of the mobile computing devices;receiving updated data from one of the sources of network activity logdata from the computing system configured to select content for deliveryto mobile computing devices; re-updating the multi-namespace mappingbased on the updated data.18. The method of embodiment 17, comprising: receiving the updated datawith means for handling real-time data feeds; processing messages basedon the updated data from the means for handling real-time data feedswith a compute cluster having means for concurrently processing themessages with fault tolerance; updating the profile with means forprofiling users; and storing the updated profile in an in-memory,non-relational database.19. The method of embodiment 18, comprising: receiving an ad request;responding to the ad request within less than 200 milliseconds ofreceiving the ad request by querying the profile in the in-memory,non-relational database and determining a response to the ad requestbased on query results.20. A tangible, non-transitory, machine-readable medium storinginstructions that when executed by a data processing apparatus cause thedata processing apparatus to perform operations comprising: any ofembodiments 1-19.21. A system, comprising: one or more processors; and memory storinginstructions that when executed by the processors cause the processorsto effectuate operations comprising: any of embodiments 1-19.

What is claimed is:
 1. A method of joining data from feeds from multiplesources of computing device network activity data having heterogenousdevice identifier namespaces and device identifier to device mappingsthat change over time, the method comprising: accessing, with one ormore processors, three or more sources of network activity log data fromthree or more different sources of network activity data, wherein: eachsource of network activity log data describes network activity by morethan 100,000 mobile computing devices, each source of network activitylog data describes activities over a duration of time longer than onehour, each source of network activity log data provides transactionrecords of more than one 1 million transactions by at least some of themobile computing devices, each transaction record including one or moreexternal-namespace device identifiers in an external namespace of arespective mobile computing device participating in the respectivenetwork transaction, and the transaction records associate geolocationsreported by the mobile computing devices with timestamps andexternal-namespace device identifiers of the mobile computing devices;for each of the sources of network activity log data, based therespective network activity log data, updating, with one or moreprocessors, a multi-namespace mapping that maps the external-namespacedevice identifiers to internal-namespace device identifiers in aninternal namespace of a system configured to profile mobile computingdevices based on geolocations in logged network activity data of themobile computing devices, wherein: the namespace mapping comprises aplurality of external-namespace-specific mappings each mapping arespective type of device identifier in a respective external namespaceused in the network activity log data to one or more internal-namespacedevice identifiers, and at least some of the external-namespace deviceidentifiers are mapped in at least some of theexternal-namespace-specific mappings to a plurality ofinternal-namespace device identifiers, with a given deviceexternal-namespace device identifier being mapped to a given pluralityof internal-namespace device identifiers; after updating themulti-namespace mapping, receiving, with one or more processors, anexternal-namespace device identifier; selecting, with one or moreprocessors, one of the external-namespace-specific mappings based on theexternal namespace of the received external-namespace device identifier;accessing, with one or more processors, a plurality ofinternal-namespace device identifiers mapped to the receivedexternal-namespace device identifier by the selectedexternal-namespace-specific mapping; and accessing, with one or moreprocessors, a device profile associated with at least some of theplurality internal-namespace device identifiers.
 2. The method of claim1, wherein updating the multi-namespace mapping comprises, for the givenexternal-namespace device identifier: determining to add a newinternal-namespace device identifier mapping to the givenexternal-namespace device identifier; and adding the newinternal-namespace device identifier as a new branch to a graph of thegiven plurality of internal-namespace device identifiers.
 3. The methodof claim 2, wherein the graph of the given plurality of deviceidentifiers in the internal namespace comprises: three or moreinternal-namespace device identifiers; and edges indicating that some ofthe three or more internal-namespace device identifiers are newerversions of others of the three or more internal-namespace deviceidentifiers.
 4. The method of claim 2, wherein the graph is an acrylicgraph and wherein nodes of the graph are associated with scoresindicative of a likelihood that the corresponding internal-namespacedevice identifier is correctly assigned to the given external-namespacedevice identifier.
 5. The method of claim 1, wherein each plurality ofinternal-namespace device identifiers mapped to a singleexternal-namespace device identifier are associated with links betweenrespective pairs of the internal-namespace device identifiers indicatingrelationships between the plurality of internal-namespace deviceidentifiers.
 6. The method of claim 1, wherein at least one of theexternal-namespace-specific mappings includes an associative arraycomprising key-value pairs, wherein: keys of the key-value pairs areoutputs of a hash function upon taking as an input an external-namespaceidentifier; the keys of the key-value pairs are values in a sequentialindex of the associative array; and values of the key-value pairs areinternal-namespace device identifiers mapped to the correspondingexternal-namespace identifier that yields a hash function output of thecorresponding key.
 7. The method of claim 1, wherein at least one of theexternal-namespace-specific mappings is a tree in which branchesrepresent different portions of the corresponding external namespace andat least some nodes are mapped to internal-namespace device identifierscorresponding to the portions of the corresponding external namespace.8. The method of claim 1, wherein: at least some internal-namespacedevice identifiers appear in a plurality of differentexternal-namespace-specific mappings; and at least some of theexternal-namespace-specific mappings correspond to different mobileoperating systems from one another.
 9. The method of claim 1, wherein:updating the multi-namespace mapping comprises updating a subset of themulti-namespace mapping without re-calculating a different subset of themulti-namespace mapping.
 10. The method of claim 1, wherein:internal-namespace device identifiers mapped to external-namespacedevice identifiers change over time; and both mappings from theexternal-namespace device identifiers to older and newerinternal-namespace identifiers are maintained in the multi-namespacemapping after the changes.
 11. The method of claim 1, wherein theexternal namespaces comprise: identifiers for advertising assigned by afirst operating system; and advertising identifiers assigned by a secondoperating system different from the first operating system.
 12. Themethod of claim 1, wherein: the external namespaces include more than 6different external namespaces, each of the different external namespacesincluding identifiers for devices that are represented in at least oneof the other eternal namespace.
 13. The method of claim 1, comprising:determining that the given plurality of internal-namespace deviceidentifiers comprises more than a threshold amount of the deviceidentifiers and, in response, omitting a corresponding mapping fromresponses to access request for the given external-namespace deviceidentifier.
 14. The method of claim 1, comprising: determining thatanother given internal-namespace device identifier is mapped in a givenpair of external namespaces; and determining that the given pair ofexternal namespaces correspond to different operating systems and, inresponse, determining to not use at least one of the correspondingmappings.
 15. The method of claim 1, wherein updating themulti-namespace mapping comprises steps for updating a multi-namespacemapping.
 16. The method of claim 1, comprising: profiling one of themobile computing devices based on geolocations associated in thetransaction records with different external-namespace device identifiersand in the multi-namespace mapping with at least one of the sameinternal-namespace device identifiers; and storing a resulting profilein memory in association with one or more internal-namespace deviceidentifiers of the one of the mobile computing devices.
 17. The methodof claim 16, comprising: sending a computing system configured to selectcontent for delivery to mobile computing devices data indicative of theprofile and one or more external-namespace device identifiers associatedwith the one of the mobile computing devices; receiving updated datafrom one of the sources of network activity log data from the computingsystem configured to select content for delivery to mobile computingdevices; and re-updating the multi-namespace mapping based on theupdated data.
 18. The method of claim 17, comprising: receiving theupdated data with means for handling real-time data feeds; processingmessages based on the updated data from the means for handling real-timedata feeds with a compute cluster having means for concurrentlyprocessing the messages with fault tolerance; updating the profile withmeans for profiling users; and storing the updated profile in anin-memory, non-relational database.
 19. The method of claim 18,comprising: receiving an ad request; and responding to the ad requestwithin less than 200 milliseconds of receiving the ad request byquerying the profile in the in-memory, non-relational database anddetermining a response to the ad request based on query results.
 20. Asystem, comprising: one or more processors; and memory storinginstructions that when executed by the processors cause the processorsto effectuate operations comprising: accessing three or more sources ofnetwork activity log data from three or more different sources ofnetwork activity data, wherein: each source of network activity log datadescribes network activity by more than 100,000 mobile computingdevices, each source of network activity log data describes activitiesover a duration of time longer than one hour, each source of networkactivity log data provides transaction records of more than one 1million transactions by at least some of the mobile computing devices,each transaction record including one or more external-namespace deviceidentifiers in an external namespace of a respective mobile computingdevice participating in the respective network transaction, and thetransaction records associate geolocations reported by the mobilecomputing devices with timestamps and external-namespace deviceidentifiers of the mobile computing devices; for each of the sources ofnetwork activity log data, based the respective network activity logdata, updating a multi-namespace mapping that maps theexternal-namespace device identifiers to internal-namespace deviceidentifiers in an internal namespace of a system configured to profilemobile computing devices based on geolocations in logged networkactivity data of the mobile computing devices, wherein: the namespacemapping comprises a plurality of external-namespace-specific mappingseach mapping a respective type of device identifier in a respectiveexternal namespace used in the network activity log data to one or moreinternal-namespace device identifiers, and at least some of theexternal-namespace device identifiers are mapped in at least some of theexternal-namespace-specific mappings to a plurality ofinternal-namespace device identifiers, with a given deviceexternal-namespace device identifier being mapped to a given pluralityof internal-namespace device identifiers; after updating themulti-namespace mapping, receiving an external-namespace deviceidentifier; selecting one of the external-namespace-specific mappingsbased on the external namespace of the received external-namespacedevice identifier; accessing a plurality of internal-namespace deviceidentifiers mapped to the received external-namespace device identifierby the selected external-namespace-specific mapping; and accessing adevice profile associated with at least some of the pluralityinternal-namespace device identifiers.